Re: [DRBD-user] Mount and use disk while Inconsistent?

2018-09-21 Thread Dan Ragle

Good to know, thanks.

Cheers!

Dan

On 9/21/2018 1:55 PM, digimer wrote:

Yup, its fine.

Note though; If the UpToDate node goes offline, the Inconsistent node will force itself to Secondary and be unusable. So while it's 
possible to mount and use, be careful that whatever is being used can handle having the storage ripped out from under it.


digimer


On 2018-09-21 01:52 PM, Dan Ragle wrote:

Just double checking. Is it ok to have a dual-primary setup where both nodes 
are primary while one is still syncing?

[node1]# drbdadm status
r0 role:Primary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate
  node2.mydomain.com role:Primary
    volume:0 replication:SyncSource peer-disk:Inconsistent done:99.46
    volume:1 peer-disk:UpToDate

First time I've seen it in my testing. Nothing complained about it so I *think* 
it's ok ...
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user




___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Mount and use disk while Inconsistent?

2018-09-21 Thread Dan Ragle

Just double checking. Is it ok to have a dual-primary setup where both nodes 
are primary while one is still syncing?

[node1]# drbdadm status
r0 role:Primary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate
  node2.mydomain.com role:Primary
volume:0 replication:SyncSource peer-disk:Inconsistent done:99.46
volume:1 peer-disk:UpToDate

First time I've seen it in my testing. Nothing complained about it so I *think* 
it's ok ...
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD9: full-mesh and managed resources

2016-08-18 Thread dan
On Thu, Aug 18, 2016 at 6:03 AM, Veit Wahlich  wrote:
> But the shortest link is not guaranteed. Especially after recovery from
> a network link failure.
> You might want to monitor each node for the shortest path.

Simplest solution here is to overbuild.  If you are going to do a
3-node 'full-mesh' then you should consider 10G ethernet (a melanox w/
cables on ebay is about US$20 w/ cables!).  Then you just enable STP
on all the bridges and let it be.  If you are taking 2 hops, that
should still be well over the transfer rates you need for such a small
cluster and STP will eventually work itself out.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdmanage init with zfs storage?

2016-08-16 Thread dan
On Tue, Aug 16, 2016 at 1:36 AM, Roland Kammerer
 wrote:
> The control volume itself is always on LVM and there is currently no way
> to change that. It's 2 x 4M. I don't see a reason to change that.

I'm running a ZFS so LVM for the cluster is on top of that.  It's
fine, but an extra layer and whatever the LVM guys don't like about
/dev/zd* for pv's.  I've found people using loop mounts to get around
this, but I think that is double clumbsy, zfs zpool to losetup to lvm
:/

>
> You can put all your data volumes on ZFS (thin, thick, configurable
> block size and what not). Documented in the drbd9 user guide.

this is working fine, I configured the nodes to use zfs storage driver
and that works perfectly.  It's just the drbdpool volume that is
convoluted and I was hoping for a more elegant solution.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] drbdmanage init with zfs storage?

2016-08-16 Thread dan
Hello!  First off, I'm new to drbd, or rather I haven't used it in a
very very long time.

I'm looking for a way to:
a) init a new drbd cluster with drbdmanage backed by zfs (instead of lvm)
or
b) init a new drbd cluster with drbdmanage and then change the control
volume over to zfs afterwards.

I can't find documentation on either possibility

for reference, this is what I'm trying to accomplish
'hyperconverged' proxmox 4.2 cluster, zfs as the system volume
management with drbd syncing a zvol, running lvm on top of that zvol
for proxmox live migration to work.

I really want to stick with zfs because of it's checksums, and for
some data (backups and ISO/templates) compression and deduplication.

Thanks.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Is this configuration a bad idea?

2015-05-07 Thread Dan Craciun
Hi,

I have 2 servers running about 10 OpenVZ containers and 4 KVM VMs, using
Proxmox as a frontend.

Both servers have 1TB hdds, mirrored using zfs.

I'm considering adding high availability to this setup.

My idea: configure a third server as a backup and use DRBD to mirror the
data from both servers over the network.

Most guides I've seen on the Internet assume you're going to use full
disks for DRBD mirrors.

In this case, I would create a partition on each server (200-300 GB) and
use DRBD to constantly mirror data on the third server.

Is this setup possible? The third server would have 2 different DRBD
partitions.

What do you recommend running on top of the DRDB partitions? ext4?

Thank you.

Best regards,
Dan
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd: refactor use of first_peer_device()

2014-07-12 Thread Dan Carpenter
[ For some reason I was looking at old warnings and this showed up.
  Sorry for sending these a long time after the fact.  - dan ]

Hello Lars Ellenberg,

This is a semi-automatic email about new static checker warnings.

The patch 44a4d551846b: drbd: refactor use of first_peer_device()
from Nov 22, 2013, leads to the following Smatch complaint:

drivers/block/drbd/drbd_nl.c:688 drbd_set_role()
 error: we previously assumed 'peer_device' could be null (see line 560)

drivers/block/drbd/drbd_nl.c
   559  struct drbd_peer_device *const peer_device = 
first_peer_device(device);
   560  struct drbd_connection *const connection = peer_device ? 
peer_device-connection : NULL;
   ^^^
Check.

   561  const int max_tries = 4;
   562  enum drbd_state_rv rv = SS_UNKNOWN_ERROR;
   563  struct net_conf *nc;
   564  int try = 0;
   565  int forced = 0;
   566  union drbd_state mask, val;
   567  

[ snip ]

   684  
   685  if (device-state.conn = C_WF_REPORT_PARAMS) {
   686  /* if this was forced, we should consider sync */
   687  if (forced)
   688  drbd_send_uuids(peer_device);
^^^
Dereferenced inside the function.

   689  drbd_send_current_state(peer_device);
   690  }

regards,
dan carpenter
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Considerations for using bcache on top of DRBD

2014-02-19 Thread Dan Barker
 -Original Message-
 From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
 boun...@lists.linbit.com] On Behalf Of Andrew Martin
 Sent: Tuesday, February 18, 2014 2:14 PM
 To: Berto Bermudez
 Cc: drbd-user@lists.linbit.com
 Subject: Re: [DRBD-user] Considerations for using bcache on top of DRBD
 
 - Original Message -
  From: Berto Bermudez be...@momar.com
  To: Andrew Martin amar...@xes-inc.com, drbd-user@lists.linbit.com
  Sent: Monday, February 17, 2014 2:06:33 PM
  Subject: RE: [DRBD-user] Considerations for using bcache on top of DRBD
 
  You didn't indicate where in your setup the SSDs would sit. I would
 suggest
  you look at also replicating the ssd contents using drbd and layering
 bcache
  on top of that, so that in the event of failover you don't have the
 penalty
  of cold caches.
 
  HDDs -- md/raid -- LVM -- DRBD -- bcache |-- ext4
SSD-- DRBD -- bcache |
 
  There was a talk about something similar, but using flashcache
  http://www.youtube.com/watch?v=l910kiEuHOM
 
  Berto
  Momar, Inc.
 Hi Berto,
 
 I was intending to keep the SSDs as the top layer, above DRBD. If I
 understand
 correctly, you're suggesting creating a separate DRBD device on top of the
 SSD and then passing that DRBD device to bcache as the cache device, thus
 propagating the cache changes between nodes?
 backing device: HDDs -- md/raid -- LVM -- DRBD -- bcache -- ext4
 cache device: SSDs -- md/raid -- DRBD -- bcache
 
 Could this create a problem where during failover the cache device is held
 open
 and thus can't be promoted to primary on the other node? In writeback mode
 this
 would prevent the device from being usable until the cache device could be
 failed over successfully.

In the Video, Florian discusses this. You put both the SSD and the HDD in the 
same resource (Requires version 8.4). In that way, the HDD and SSD remain 
linked, as intended. Putting them in separate resources, a la version 8.3, 
would be a mess as you note.

Dan
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd Input/output error

2014-02-15 Thread Dan Barker
 -Original Message-
 From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
 boun...@lists.linbit.com] On Behalf Of Piotr Kloc
 Sent: Friday, February 14, 2014 7:49 PM
 To: drbd-user@lists.linbit.com
 Subject: [DRBD-user] drbd Input/output error
 
 Hello !
 I have problem with drbd
 
 primary
 
 [root@wirt ~]# cat /proc/drbd
 version: 8.3.13 (api:88/proto:86-96)
 GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted,
 2012-10-09 12:47:51
 
  1: cs:StandAlone ro:Primary/Unknown ds:Diskless/DUnknown   r-
 ns:0 nr:72 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 [root@wirt ~]#
 
 and secondary
 
 [root@wirt2 ~]# cat /proc/drbd
 version: 8.3.13 (api:88/proto:86-96)
 GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted,
 2012-10-09 12:47:51
 
  1: cs:WFConnection ro:Secondary/Unknown ds:Diskless/DUnknown C r-
 ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 [root@wirt2 ~]#
 
 
 
 on primary i got
 
 [root@wirt ~]# lvs
   /dev/drbd1: read failed after 0 of 4096 at 1892425662464: Input/output
 error
   /dev/drbd1: read failed after 0 of 4096 at 1892425728000: Input/output
 error
   /dev/drbd1: read failed after 0 of 4096 at 0: Input/output error
   /dev/drbd1: read failed after 0 of 4096 at 4096: Input/output error
   LV   VG   Attr   LSize   Pool Origin Data%  Move Log Cpy%Sync
 Convert
   root vg0  -wi-ao  50.00g
   swap vg0  -wi-ao  16.00g
   tmp  vg0  -wi-ao  25.00g
   vm1  vg1  -wi-ao   1.17t
   vm2  vg1  -wi-ao 400.00g
 
 
 I have LVM on DRBD and DRBD is on the /dev/md2 RAID1 system

No, drbd is not on anything: cs:StandAlone ro:Primary/Unknown 
ds:Diskless/DUnknown

You have not shown the resource definition files, so we can only guess. What 
has been done after / filled up? I would think at least freeing some space and 
a reboot, but you don't say.

 
  device minor 1;
 disk /dev/md2;
 address IP:7788;
 meta-disk internal;
 
 
 
 There was  temporary out of space on / partition and then we got this
 errors
 
 How we can fix this now ?
 Can I run fsck on  /dev/drbd1 ?

No, you can't do anything against a diskless resource.

 
 
 Piotr

The I/O errors are expected - drbd's devices are not present or not available. 
LVM may have grabbed them if your filters are not set up correctly. Again, 
please provide more doc. Output from mount, your drbd resource definition files 
and your LVM filters.

Dan

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd not working with high mtu?

2014-02-01 Thread Dan Barker
 -Original Message-
 From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
 boun...@lists.linbit.com] On Behalf Of Harka Gyozo SA
 Sent: Saturday, February 01, 2014 1:32 AM
 To: drbd-user@lists.linbit.com
 Subject: [DRBD-user] drbd not working with high mtu?
 
 Hi!
 
 Anyone knows about an issue with mtu?
 I tried to increase it to max ( there are two nics with a crossover cable,
 so
 no switch, router etc.. ). I found that max mtu supported by my card's
 driver
 is 7200.
 ssh, ping (also with big packages), nfs works.
 
 drbd keeps saying:] block drbd1: [drbd1_worker/16675] sock_sendmsg time
 expired, ko = ...
 
 If I decrease the mtu to 6000, error message stops, and I can see the
 syncing
 in the /proc/drbd (While the mtu is high it's stalled).
 
 If I increase the mtu to 6200, kernel message returns, communication
 stalled.
 
 This is a bug? Or maybe I should increase buffer sizes in the config?
 [ version: 8.3.7 (api:88/proto:86-91) ]
 
 
 HARKA Győző
 ---

ssh, ping, nfs all will fragment packets.

Try ping with a 10K size and the don't fragment switch set. Lower the size 
until it works.

ping -M do -s 6800 -c 1 host

Dan
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Antwort: Re: proto c - corrupt files - directories missing

2014-01-22 Thread Dan Barker


 -Original Message-
 From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
 boun...@lists.linbit.com] On Behalf Of Bauer, Stefan (IZLBW Extern)
 Sent: Wednesday, January 22, 2014 10:11 AM
 To: drbd-user@lists.linbit.com
 Subject: Re: [DRBD-user] Antwort: Re: proto c - corrupt files -
 directories missing
 
 
 How do you trigger the email?
 I don't see anything in the manpage for the action OOS blocks.
 I ionly see local-io-error and stuff like this.
 
 Stefan
 ___

My global_common.conf contains:

out-of-sync  /usr/lib/drbd/notify-out-of-sync.sh use...@comain.ltd;

I did not look in the doc. Gotta run out.

Dan
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] data mismatch when primary/secondary are both up2date

2013-11-29 Thread Dan Barker

 In my cluster(node1/node2) with drbd, the state in /proc/drbd is   
 primary/secondary up2date/up2date, but when I change primary to node2 , the 
 file that existed on node1 can not be found on node2.
Then I do drbdadm verify drbd0 to verify and resync the data,  node2's data 
returned to be OK.

 I am wondering how the problem occurs and how I can avoid it? 

 BTW: I build a PV/VG/LV on drbd0, and drbd0 is built on a LV too.  Is this 
 the reason?

 Thanks.


This sounds as if you are mixing drbd device accesses and backing device 
accesses, but insufficient details are given. Also, the verify succeeding seems 
to say something else may be going on.

Please provide more information, such as your LVM setup, drbd config, file 
systems and the test cases.

Dan in Atlanta

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Fwd: how to place secondary node in maintenance mode

2013-09-24 Thread Dan Barker
Wow! Those are some old versions!

Just “drbdadm disconnect all” on the Primary node before the maintenance and 
“drbdadm connect all” afterwards. All else is magic!

Dan

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Gabriel Sosa
Sent: Tuesday, September 24, 2013 11:08 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] Fwd: how to place secondary node in maintenance mode

We have a pretty standard setup
MASTER --- SLAVE (STANDBY)

both nodes are in sync without much issue now but during this week our DC will 
perform some maintenance tasks that might incur in some connectivity issues 
between both nodes.
How can put the slave node in maintenance (or any other mode) in order to avoid 
any split brain situation?
I've been reading [1] and [2] but I can't find a clear answer to this.
should I just take down the service on the SLAVE node performing a *service 
drbd stop* or instead disable the resource using *drbdadm down resourcename*

versions:
MASTER - version: 8.0.16 (api:86/proto:86)
SLAVE - version: 8.3.15 (api:88/proto:86-97)
Thanks

[1] http://www.drbd.org/users-guide/ch-admin.html
[2] http://www.geekpeek.net/drbd-management-command-usage/


--
Gabriel Sosa
Sometimes the questions are complicated and the answers are simple. -- Dr. Seuss
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Unable to sync new machine

2013-09-15 Thread Dan Barker
dmesg should show why earth won't stay WFC. The cat /proc/drbd just shows the 
state, it doesn't show the why.
vulcan won't connect because earth is not WFC.

Dan

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Hans Lammerts
Sent: Saturday, September 14, 2013 8:48 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] Unable to sync new machine

Hi there,

I've been using drbd for approx. 2 years now, and never had any big problems.
Recently, one of two machines in my cluster crashed, and I had to reinstall it 
completely.
Now I seem to be unable to sync that second machine with the first one.

The situation:

I'm using drbd 8.4.0
Machine 1 is called earth, machine 2 is called vulcan.
Earth is the survivor half of my cluster, and vulcan had to be rebuilt.

The actions I've taken:
After compiling drbd on vulcan, I copied the resource file from /etc/drbd.d 
from earth to
vulcan in the same place. In this case, the resource file for mysql, which 
looks like this :


resource mysql {
  protocol C;
  syncer {
rate 4M;
  }
  startup {
wfc-timeout 15;
degr-wfc-timeout 60;
  }

handlers {

split-brain /usr/lib/drbd/notify-split-brain.sh 
j.lamme...@chello.nlmailto:j.lamme...@chello.nl;
}

net {
cram-hmac-alg sha1;
shared-secret xxx;

verify-alg sha1;



after-sb-0pri discard-zero-changes;

after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
  on vulcan {
device /dev/drbd0;
disk /dev/sda5;
address 192.168.0.15:7788;
meta-disk internal;
  }
  on earth {
device /dev/drbd0;
disk /dev/sda5;
address 192.168.0.5:7788;
meta-disk internal;
  }
}
Then, I created the device meta-data on vulcan:
drbdadm create-md mysql

After (re)starting drbd on both machines, the cat /proc/drbd shows this:

earth:
[root@earth ~]# cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@earth, 
2013-09-07 17:35:35
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-
ns:0 nr:0 dw:19360172 dr:6497416 al:138 bm:50 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b 
oos:216372

vulcan:
[root@vulcan drbd.d]# cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@vulcan, 
2013-09-12 16:25:17
0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown   rs
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:4652876

So both Standalone, and Primary/Unknown vs. Secondary/Unknown.

Having Googled for this situation and its solution, I tried the solution as 
described in the Linbit DRBD manual,
but stopping short of 5.4. The only thing that happens (as far as I can see) 
is that earth briefly goes into
the WFConnection state, and nothing else.

I tried the following as well:
On earth:
drbdadm connect all
On vulcan:
drbdadm -- --discard-my-data connect all (or drbdadm connect -discard-my-data 
mysql, can't remember exactly)

But this did not get the synching of the resource started as well.

I'm out of ideas, and can't really find anything searching Google different 
from what I have already tried.

So, please, if anyone can give me a clou on how to resolve this situation, 
preferably without losing any data on earth,
I would be most grateful.

Thanks,
Hans

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] generation identifiers

2013-09-08 Thread Dan Barker
See below: 

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of FDS | Forensik Data 
Services
Sent: Friday, September 06, 2013 9:06 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] generation identifiers

Hi there,

I am involved in the aftermath of a split brain event (drbd 8.3.8). log files 
have given us several entries for drbd_sync_handshakes:


1.   Feb 28 07:10:04 san02 kernel: [ 1435.507176] block drbd1: self 
384ACFA6DDE7F305:6EC706A11FB6AAEB:AD2868A6C65F20AA:D5016EA9D8E57175 
bits:28397184 flags:0

2.   Feb 28 07:10:04 san02 kernel: [ 1435.507472] block drbd0: self 
8FFC8087A5A8ACDF:128A2EF7246101ED:00C719D52E8C2482:4EEC506B23021307 bits:225135 
flags:0

3.   Feb 28 07:11:36 san01 kernel: [4427691.562720] block drbd1: self 
16DA13A883B368DA:6EC706A11FB6AAEA:AD2868A6C65F20AA:D5016EA9D8E57175 bits:3072 
flags:0

4.   Feb 28 07:11:36 san01 kernel: [4427691.562923] block drbd0: self 
2DBF62FFC79BF9F4:128A2EF7246101EC:00C719D52E8C2483:4EEC506B23021307 
bits:53722300 flags:0

5.   Feb 28 07:17:46 san02 kernel: [ 1897.442652] block drbd0: self 
8FFC8087A5A8ACDE:8A4BA5176FB14BAD:128A2EF7246101ED:00C719D52E8C2482 bits:21858 
flags:0

6.   Feb 28 07:17:46 san02 kernel: [ 1897.443239] block drbd1: self 
384ACFA6DDE7F304:E9797096C25A015C:6EC706A11FB6AAEB:AD2868A6C65F20AA 
bits:24262845 flags:0

7.   Feb 28 07:19:19 san01 kernel: [4428153.502129] block drbd0: self 
8A4BA5176FB14BAC::00C719D52E8C2483:4EEC506B23021307 
bits:49588595 flags:0

8.   Feb 28 07:19:19 san01 kernel: [4428153.502363] block drbd1: self 
E9797096C25A015C::AD2868A6C65F20AA:D5016EA9D8E57175 bits:0 
flags:0

The Cluster consists of 2 nodes (san01 und san02) with two block devices 
(drbd01 und drbd02) running in dual primary with OCFS2.

Analyzing Generation Identifiers oft he first handshake:

· Current UUIDs (1.-4.) differ due to split brainsituation.

·  BitMap UUIDs (1.-4.) are nearly identical. Only the last digit 
differs from A-D. First Historical UUIDs (1.-4.) with drbd1 are identical but 
differ for block drbd0. Again only the last digit (2 vs 3).

· Second Historical UUID (1.-4.) for both block-Devices are identical 
meaning both devices have the same data set, or?

Questions:

· Reason and meaning for differing last digit UUIDs are unknown. No 
information in the doc.
 I don't know

· How can I see when the data sets were sync for the last time?
 Careful analysis of the logs

· Meaning of values for flags fe 0 or 2?
  I don't know

· Is it possible to decide which resource (drbd0 or drbd1) was primary 
by comparing UUIDs?
 You had dual primary - they were both primary. You'll have to look at the 
 data itself. For example if one file on one of the standalone drbd0 exists 
 with a newer date, that file is likely more valid than the other file. 
 Depending on how and why you had dual primary without proper fencing in the 
 first place, it may be trivial to determine which to keep and which to 
 discard-my-data. If they are dual primary solely to provide live 
 migration - AND - no live migrations were taking place at the time of the 
 split brain event, you are fairly safe using the resource (or disk image) 
 to which a VM was pointed at the time of the event. If the file systems 
 were shared by multiple hosts, you are stuck with file by file analysis - 
 and there are no promises that the newest file will contain all the data. 
 You may need to do a row by row analysis depending on your applications.

 Again, without knowing the sharing and application logic in detail about 
 your specific systems, split-brain analysis is impossible.

 hth and good luck!

 (Get fencing working!, Get a current drbd installed!!)

 Dan

Aprecciating your help.

--
Mit freundlichen Grüßen
Ing. Mag. Horst Greifeneder
(fds)

FDS | Forensik Data Services.
Better secure than sorry!

Schenkelbachweg 32. A - 4600  W e l s.
tel. 07242. 777 15. fax. 07242. 777 16.
mailto:off...@fds.at

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] csums-alg seems not working on my cluster....

2013-09-05 Thread Dan Barker
That's a very difficult way to go about setting up internal metadata. Normally, 
just we create the metadata on the raw device (/dev/sdb) and then create the 
filesystem on the drbd device (/dev/sql_data1). No math!

You did not appear to specify a syncer rate. I thought the default was much 
higher than 2040K, but that's the target for the sync operation. Why not set 
the synch rate up to some reasonable percentage (most all for initial sync, 
maybe 30% of your bandwidth thereafter) of the available bandwidth. You say 
low without defining it. The displays appear syncher rate constrained.

Also, you don't have to do a full sync on initially empty disks. That's in the 
doc under clear-bitmap and/or new-current-uuid.

You can modify the syncher rate while running, or in the config files and then 
adjust the resources.

Dan

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Lafaille Christophe
Sent: Thursday, September 05, 2013 9:37 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] csums-alg seems not working on my cluster

Hi All,

I need to use very low bandwith network between 2 machines using drbd and I try 
using csums-alg/verify-alg.

But I've same duration with or without csums-alg !

Execution with csums-alg:
[root@sms246105 drbd.d]# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@rh63_build, 
2013-01-10 09:57:53

 1: cs:SyncTarget ro:Secondary/Secondary ds:Inconsistent/UpToDate C r-
ns:0 nr:512 dw:512 dr:147968 al:0 bm:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f 
oos:4904384
[] sync'ed:  3.0% (4788/4932)M
finish: 0:15:17 speed: 5,332 (5,284) want: 2,040 K/sec

Execution without csums-alg:
[root@sms246105 drbd.d]# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@rh63_build, 
2013-01-10 09:57:53

 1: cs:SyncTarget ro:Secondary/Secondary ds:Inconsistent/UpToDate C r-
ns:0 nr:53760 dw:53760 dr:0 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f 
oos:4819904
[] sync'ed:  1.2% (4704/4756)M
finish: 0:14:52 speed: 5,376 (5,376) want: 2,040 K/sec
I don't know where is the problem... is csums-alg usable only in a more recent 
version of DRBD (like 8.4.3 or 8.4.4) ?
I've built drbd packages from source, perhaps I need to specify an option in 
order to have csums-alg facility (I'll check for this) ?

I've put csums-alg in net section and in some web pages I've found a syncer 
section with csums-alg (seems no more available in 8.4.x versions).
== what's the good place ?

On both machines, I do this sequence:
# /etc/init.d/drbd stop
# delete all partition on /dev/sdb and create a 5GB (for my tests, real size is 
around 300GB) partitions with fdisk
# partprobe /dev/sdb
# dd if=/dev/zero of=dev/sdb1 bs=4096   == to initialize disk content
# mkfs.ext3 -j -m 0 -b 4096 /dev/sdb1
# PARTSIZE=`sfdisk -s /dev/sdb1 | xargs -i echo {} 1024 / 1024 / p | dc`
# NEWSIZE=$[${PARTSIZE}-2]
# resize2fs /dev/sdb1 ${NEWSIZE}G
# e2fsck -f /dev/sdb1
# /etc/init.d/drbd start
# /sbin/drbdadm create-md sqldata
# /sbin/drbdadm up sqldata
On one machine: # /sbin/drbdadm --force primary sqldata
The file  /etc/drbd.d/sqldata.res :
resource sqldata {
device /dev/drbd_sqldata minor 1;
disk   /dev/sdb1;
meta-disk  internal;
on sms246104 {
address 135.117.246.104:7788;
}
on sms246105 {
address 135.117.246.105:7788;
}
}
The file /etc/drbd.d/global_common.conf :
global {
usage-count yes;
dialog-refresh 1;
minor-count 5;
}
common {
handlers {
pri-on-incon-degr /usr/lib/drbd/notify-pri-on-incon-degr.sh; 
/usr/lib/drbd/notify-emergency-reboot.sh; echo b  /proc/sysrq-trigger ; reboot 
-f;
pri-lost-after-sb /usr/lib/drbd/notify-pri-lost-after-sb.sh; 
/usr/lib/drbd/notify-emergency-reboot.sh; echo b  /proc/sysrq-trigger ; reboot 
-f;
local-io-error /usr/lib/drbd/notify-io-error.sh; 
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o  /proc/sysrq-trigger ; halt 
-f;
split-brain /usr/lib/drbd/notify-split-brain.sh root;
}

startup {
wfc-timeout 15;
}

options {
}

disk {
on-io-error detach;
c-plan-ahead 20;
c-fill-target 50k;
c-min-rate 250k;
c-max-rate 2M;
}

net {
timeout 60;
ping-int 6;
after-sb-0pri discard-younger-primary;
after-sb-1pri discard-secondary;
after-sb-2pri call-pri-lost-after-sb;
ping-timeout 60;
protocol C;
cram-hmac-alg sha1;
shared-secret TestHA;
csums-alg sha1;
verify-alg sha1;
}
}

Traces in /var/log/kern.log :
Sep  5 13:03:16 sms246104 kernel: events: mcg drbd: 2
Sep  5 13:03:16 sms246104 kernel: drbd: initialized. Version: 8.4.2 
(api:1/proto:86-101)
Sep  5 13:03:16 sms246104 kernel: drbd: GIT-hash

Re: [DRBD-user] DRDB over Software RAID1 - Failure: (104) Can not open backing device

2013-08-26 Thread Dan Barker
DRBD can't use /dev/md4 because it's in use. It has a mounted filesystem using 
the entire device.

You have your resource stack out of order. You would share the drbd device 
created from the md4 device. However, /dev/md4 already has a file system so you 
must shrink it or use external metadata. I'd suggest you shrink it and use 
internal metadata.

What you have:

* There are various disk devices (not specified)

* Upon which is run raid providing /dev/md4

* Which contains an ext2 filesystem (/srv)

* Which is shared.

What you want:

* There are various disk devices (not specified)

* Upon which is run raid providing /dev/md4

* Which is the backing device for DRBD

* Which provides /dev/drbd1

* Which contains an ext2 filesystem (/srv)

* Which is shared.

Clear as mud?

Dan

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Martin Krammer, New 
Media Interactive
Sent: Monday, August 19, 2013 4:16 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] DRDB over Software RAID1 - Failure: (104) Can not open 
backing device

I have two webservers running debian 7 (stable) with software RAID1 and DRBD 
8.3.11.
On both servers there are the following shares, which should be connected:

On stella: /dev/md4   919014380 
204664 872126436   1% /srv
On laura: /dev/md4   442143360 
153168996 266514700  37% /srv

Later, the data of laura should be syncronized on /dev/drdb1.

The conf-file looks like:

resource r1 {
on stella {
device/dev/drbd1;
disk  /dev/md4;
address   192.168.1.1:7789;
meta-disk /dev/sdb3[0];
}
on laura {
device/dev/drbd1;
disk  /dev/md4;
address   192.168.1.2:7789;
meta-disk /dev/sdb2[0];
}
}

If I try to attach...

root@stella:/srv# drbdadm attach r1

  --==  Thank you for participating in the global usage survey  ==--
The server's response is:

node already registered
1: Failure: (104) Can not open backing device.
Command 'drbdsetup 1 disk /dev/md4 /dev/sdb3 0 --set-defaults --create-device 
--on-io-error=detach' terminated with exit code 10

Could anybody help me please?

Martin.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Sync of Nodes

2013-08-19 Thread Dan Barker
 -Original Message-
 From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
 boun...@lists.linbit.com] On Behalf Of Walter Robert Ditzler
 Sent: Monday, August 19, 2013 5:44 AM
 To: drbd-user@lists.linbit.com
 Subject: [DRBD-user] Sync of Nodes
 
 Hi all,
 
 I just made a test this weekend onto my new 2 XEN hosts:
 
 - Debian Wheezy, Kernel 3.10.7, DRBD Tools 8.4.3, Node A=master, Node
 B=secondary, XEN 4.3.0 Guests on LVM
 
 ***
 root@srv-ldeb-xen001:/var/local/abbeoo/xen.d# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.4.3 (api:1/proto:86-101)
 srcversion: 19422058F8A2D4AC0C8EF09
 m:res   cs ro ds p
 mounted
 fstype
 11:drbd_host11  Connected  Primary/Secondary  UpToDate/ UpToDate  C
 ***
 
 
 What I did is to check if synchronization works. I tested as followed:
 
 On Node A:
 1) Create a file and save it in Desktop in my XEN Guest System
 2) xl destroy host11 (kill my XEN Guest System)
 3) drbdadm secondary drbd_host11
 
 Then I continue on Node B
 1) drbdadm primary drbd_host11
 2) xl create host11.cfg (Start my XEN Guest System)
 3) No file on the desktop!!!
 
 It seem that the synchronization doesn't work. When I shutdown Node B and
 start Node A again, I see the file again on the desktop.
 
 Can anyone help me out here? Do I make any mistake while starting DRBD at
 boot?
 
 Thanks a lot,
 
 Walter
 
 
 My Nodes are linked and working
 
 ***
 root@srv-ldeb-xen001:/var/local/abbeoo/xen.d# cat /proc/drbd
 version: 8.4.3 (api:1/proto:86-101)
 srcversion: 19422058F8A2D4AC0C8EF09
 
 11: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-
 ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 ***
 
 ***
 root@srv-ldeb-xen002:/var/local/xen/xen.d# cat /proc/drbd
 version: 8.3.13 (api:88/proto:86-96)
 srcversion: ECB278A2285B40525B8362B
 
 11: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-
 ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 ***
 
 
 *** PING on Node A to Node B
 root@srv-ldeb-xen001:/var/local/abbeoo/xen.d# ping 10.255.255.2
 PING 10.255.255.2 (10.255.255.2) 56(84) bytes of data.
 64 bytes from 10.255.255.2: icmp_req=1 ttl=64 time=0.195 ms
 ***
 
 *** Ping on Node B to Node A
 root@srv-ldeb-xen002:/var/local/xen/xen.d# ping 10.255.255.1
 PING 10.255.255.1 (10.255.255.1) 56(84) bytes of data.
 64 bytes from 10.255.255.1: icmp_req=1 ttl=64 time=0.277 ms
 ***
 

All the reads and writes counters are showing zeros.

Are you sure Xen is using the drbd resources and not the underlying storage? 
Xen should be accessing /dev/drbd11. My guess is that it's looking at 
/dev/vgdata1/host11.

Dan
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Upgrading from 8.3.11-2 to 8.3.15-2

2013-06-22 Thread Dan Barker
No worries. The servers negotiate a common protocol. You can even jump to 
8.4.whatever.

Dan

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Rick Cone
Sent: Saturday, June 22, 2013 4:27 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] Upgrading from 8.3.11-2 to 8.3.15-2

Hello,

I'm going to upgrade form 8.3.11-2 to 8.3.15-2.

Just a routine question.  Is there any issue with running one node as 8.3.11-2 
and the other as 8.3.15-2 for a few hours (like maybe 12 hours)?  I want to 
upgrade the secondary system and let it run for a while, and then failover and 
upgrade the primary late tonight, etc.  Just curious if there would be issues 
with replication, etc., while in that mode.

Thanks!
Rick
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdadm verify resource with cron

2013-06-21 Thread Dan Barker
Roberto:

You are not looking for a return code, and the oos counter in /proc/drbd might 
not go up for hours.

What you want is to configure the Out Of Sync handler. Mine (cron to run 
weekly, one resource per night) says:
out-of-sync  /usr/lib/drbd/notify-out-of-sync.sh 
myem...@address.tldmailto:myem...@address.tld;

When I get these email, I can remedy the situation; disconnect/connect to 
resync, replace a drive if it's bad, whatever. It only sends the email about 
once per year (not using RAID).

Dan in Atlanta

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Roberto Fastec
Sent: Thursday, June 20, 2013 6:56 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] drbdadm verify resource with cron

Dear readers

I have configured drbd with resource syncer

Since everything is running fine, the command
#drbdadm verify resource

outputs nothing

I'm wondering what the output could be if some re-sync is needed, this is meant 
to use it with cron.

Thank you for hinting

Roberto
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdadm verify resource with cron

2013-06-21 Thread Dan Barker
I think so, but it only happens rarely. Doesn't really matter.

Dan

 -Original Message-
 From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
 boun...@lists.linbit.com] On Behalf Of AZ 9901
 Sent: Friday, June 21, 2013 10:53 AM
 To: drbd-user@lists.linbit.com
 Subject: Re: [DRBD-user] drbdadm verify resource with cron
 
 2013/6/21 Dan Barker dbar...@visioncomm.net:
  What you want is to configure the Out Of Sync handler. Mine (cron to run
  weekly, one resource per night) says:
 
  out-of-sync  /usr/lib/drbd/notify-out-of-sync.sh
  myem...@address.tld;
 
 Do both nodes send the mail, or only one ?
 Thank you !
 Ben
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
attachment: winmail.dat___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdadm verify resource with cron

2013-06-21 Thread Dan Barker
There is no such thing as in sync or out of sync if the resource is standalone. 
It simply is what it is.

Dan

From: andreas graeper [mailto:agrae...@googlemail.com]
Sent: Friday, June 21, 2013 11:09 AM
To: Dan Barker
Subject: Re: [DRBD-user] drbdadm verify resource with cron

hi,
when a drdb-device is standalone + primary, what tells me oss  0 ?
dangerous or is it just a difference to the state when it was connected last 
time ?

thanks andreas

2013/6/21 Dan Barker dbar...@visioncomm.netmailto:dbar...@visioncomm.net
Roberto:

You are not looking for a return code, and the oos counter in /proc/drbd might 
not go up for hours.

What you want is to configure the Out Of Sync handler. Mine (cron to run 
weekly, one resource per night) says:
out-of-sync  /usr/lib/drbd/notify-out-of-sync.sh 
myem...@address.tldmailto:myem...@address.tld;

When I get these email, I can remedy the situation; disconnect/connect to 
resync, replace a drive if it's bad, whatever. It only sends the email about 
once per year (not using RAID).

Dan in Atlanta

From: 
drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com]
 On Behalf Of Roberto Fastec
Sent: Thursday, June 20, 2013 6:56 PM
To: drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com
Subject: [DRBD-user] drbdadm verify resource with cron

Dear readers

I have configured drbd with resource syncer

Since everything is running fine, the command
#drbdadm verify resource

outputs nothing

I'm wondering what the output could be if some re-sync is needed, this is meant 
to use it with cron.

Thank you for hinting

Roberto

___
drbd-user mailing list
drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

attachment: winmail.dat___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Replication problems constants with DRBD 8.3.10

2013-06-17 Thread Dan Barker
The suggestion is to replace the actual RealTek NIC with an Intel NIC or some 
other dependable brand, not to use different drivers on the hardware you have. 

Clear as mud?

Dan  (top poster) in Atlanta

 -Original Message-
 From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
 boun...@lists.linbit.com] On Behalf Of cesar
 Sent: Monday, June 17, 2013 10:34 AM
 To: drbd-user@lists.linbit.com
 Subject: Re: [DRBD-user] Replication problems constants with DRBD 8.3.10
 
 Hi to all people
 
 Excuse me if I ask a question from rookie
 
 But tell me that: the in-kernel realtek drivers of the 2.6.32 kernel were
 known to cause troubles. I still have to replace them for optimum
 stability.
 try to install the current version of the driver from realtek ot switch to
 Intel-NICs
 
 So, I don't know how do it, and in this link
 http://packages.debian.org/squeeze/firmware-realtek say for download the
 drivers for Realtek and Debian, but neither know if these drivers are more
 modern that the pve kernel.
 
 I have a Kernel of RHEL 6 modified by the authors ans SO Debian squeeze
 
 And to make matters worse, in this country (Paraguay) is not easy to buy
 another brand of NICs.
 
 The model of NIC is RTL8168E and lshw shows
 firmware=rtl_nic/rtl8168e-2.fw
 and the link of debian (shown above) shows among other models:
 * Realtek RTL8111D-1/RTL8168D-1 firmware (rtl_nic/rtl8168d-1.fw)
 * Realtek RTL8111D-2/RTL8168D-2 firmware (rtl_nic/rtl8168d-2.fw)
 * Realtek RTL8168E-1 firmware (rtl_nic/rtl8168e-1.fw)
 * Realtek RTL8168E-2 firmware (rtl_nic/rtl8168e-2.fw)
 * Realtek RTL8168E-3 firmware (rtl_nic/rtl8168e-3.fw)
 
 Can anyone help me with this problem.
 
 Best regards
 Cesar
 
 
 
 --
 View this message in context: http://drbd.10923.n7.nabble.com/Replication-
 problems-constants-with-DRBD-8-3-10-tp17896p17913.html
 Sent from the DRBD - User mailing list archive at Nabble.com.
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd+mysql+innodb

2013-06-12 Thread Dan Barker
rsync will not be able to synchronize from a failed disk, drbd already has 
done so.

Dan in Atlanta

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Robinson, Eric
Sent: Wednesday, June 12, 2013 6:20 PM
To: Dirk Bonenkamp - ProActive
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] drbd+mysql+innodb

Hi Dirk - Thanks for the feedback. I do need some clarification, though. DRBD 
replicates disk block changes to a standby volume. If the primary node suddenly 
fails, the cluster manager promotes the standby node to primary and starts the 
MySQL service. Logically, this seems exactly the same as simply rsyncing the 
data to the new server and starting the MySQL service. Why would it work with 
DRBD but not with rsync? Thanks for your patience while I explore this.

Note: we have over 500 separate MySQL database instances using MyISAM. I am 
totally not stoked about the idea of using 300% more disk space and gobs more 
memory.

--
Eric Robinson


From: Dirk Bonenkamp - ProActive [mailto:d...@proactive.nl]
Sent: Wednesday, June 12, 2013 7:24 AM
To: Robinson, Eric
Cc: drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] drbd+mysql+innodb

Hi Eric,

We did the same conversion about a year ago. We run MySQL with InnoDB on a DRDB 
back-end. There's alot of stuff that's different between MyISAM and InnoDB, but 
the DRBD thing is the same.

What you say about backups is correct, but this hasn't anything to do with 
DRBD. DRDB will do fine, some other quick non-DRDB things:

- MySQL tuning is (even more) essential with InnoDB.
- InnoDB tables use (a lot) more diskspace than MyISAM, our disk usage was 
nearly 300% of MyISAM's usage for the same dataset.
- If you want performance, you want to be able to load your dataset in memory.

Kind regards,

Dirk

Op 12-6-2013 15:44, Robinson, Eric schreef:
We have been a MyISAM shop forever but we are considering switching to innodb. 
There is scant information available on using innodb with drbd. Are there 
special considerations and pitfalls? I have been told that it is not possible 
to backup innodb by doing a simple rsync of the data directory to another 
server like we can do with myisam. If that is true, what does that say about 
using innodb with drbd, which does essentially the same thing?

--
Eric Robinson



Disclaimer - June 12, 2013 This email and any files transmitted with it are 
confidential and intended solely for 
'drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com'. If you are not 
the named addressee you should not disseminate, distribute, copy or alter this 
email. Any views or opinions presented in this email are solely those of the 
author and might not represent those of Physicians' Managed Care or Physician 
Select Management. Warning: Although Physicians' Managed Care or Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


___

drbd-user mailing list

drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com

http://lists.linbit.com/mailman/listinfo/drbd-user

--
[cid:image001.gif@01CE67AC.9C400440]http://www.proactive.nl
T

023 - 5422299



www.proactive.nlhttp://www.proactive.nl



[cid:image002.gif@01CE67AC.9C400440]http://nl.linkedin.com/in/dirkbonenkamp



[My status]skype:dirkbonenkamp?call


inline: image001.gifinline: image002.gifinline: image003.png___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] promoting one of new disks

2013-05-29 Thread Dan Barker
Yes, you can assume all zeros on new disks and skip the first sync. You can 
even do it with dirty disks, as long as the file systems are new, but the first 
verify will be a deusy.

If you'd said what version of DRBD you were running, I'd give you the link in 
the manual for the correct command. But there are different manual and command 
syntaxes for 8.3 vs 8.4. Look in the appropriate doc for clear-bitmap.

Dan

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of andreas graeper
Sent: Wednesday, May 29, 2013 7:19 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] promoting one of new disks

hi,
pure drbd, no corosync/pacemaker (first attempt, i started to read about 
lvm,cluster,... some days ago)
when drbd on top of lv (lvm-logical-volumn) and on both nodes the lv are 
formatted with
mkfs.ext4, cause of error 40 i had to zero the start of lv.
after 'create-md' and 'up' on both nodes (cs:Connected ro=Sec/Sec 
ds=Inconsistant/Inconsistant) now i tried
 drbdadm --force primary r0
but i got error 17 and found in www a solution
 drbdadm invalidate r0
now automatically sync starts (cs:SyncTarget ro=Sec/Sec ds=Incons/Uptodate)
the peer is implicit declared uptodate this way ?
i should have invalidated the peer if i want to set local node to be primary ? 
(in case there are actually data on the node that shall become primary)
but the main question:
can i avoid the sync process, if on new disks there are no data ?

sync is done (cs:Connected ro=Sec/Sec ds=Uptodate/Uptodate)

i read, when brbd with pacemaker, then (i guess the resource-manager forces 
this) only on
primary i have rw-access, and on secondary i cannot even read the mirrored data.
i found the data (lv) already mounted on primary. i created a small file in 
/mnt/lva (mountpoint declared in /etc/fstab)

now i want to change roles (manual 6.5 basic manual failover).
`mount` or `df -h` does not show the mounted device.

umount /dev/vg_xxx/lv_aaa   = not mounted
umount /mnt/lva   = not mounted
umount /dev/mapper/vg_xxx-lv_aaa = not mounted
i simply tried (on current primary) :
 drbdadm secondary r
 cd /mnt/lva
 # change on that small file
 /proc/drbd - ds:uptodate/uptodate
on secondary :
 drbdadm primary r
 - ro:Primary/Secondary ds:uptodate/uptodate (roles changed)
on new primary i cannot mount the volumn cause its busy
on old primary i can still see the new file
now i stopped drbd on old primary and tried to mount the lv
but it cannot mounted as ext4 anymore, cause it is damaged by the call of dd ?!
how this is done correct: brbd on top of lv with internal metadata ?
 ? i must not zero the head of the formatted lv
 ? i have to force writing metadata to the end of that lv

thanks in advance
andreas



___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Dumb question

2013-05-22 Thread Dan Barker
The only dumb questions are the unasked question, and the question asked a 
second timeg.

If you have run a verify, then all the out-of-sync blocks are marked. cat 
/proc/drbd should show very large numbers of oos (Out Of Sync) blocks. Simply 
disconnect/reconnect the resource and it will resync the out of sync blocks in 
the proper direction. The disconnect/reconnect can be done on the Primary or 
Secondary, it doesn't matter. Forcing a sync will sync all blocks, but your 
verify has already determined which need to be updated. It will be faster, but 
still quite a lot of data, I'd imagine. You will want to do the 
disconnect/reconnect at a low-use time of day, but I'm not sure waiting for the 
weekend is a good idea. You have no reliable redundancy until the oos count is 
zero.

Verify only marks oos blocks and produces a message - it doesn't change the 
status of the disks.

hth

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Prater, James K.
Sent: Monday, May 20, 2013 7:19 AM
To: Lars Ellenberg; drbd-user@lists.linbit.com
Subject: [DRBD-user] Dumb question

Hello Lars,


   I have a real dumb question.   I have created mirrors, between two peers 
(active/passive) but did not do the initial sync.  I was going to wait until 
the weekend to do that.   However I had forgotten that I had placed drbdadm 
verify all in my cron, and not it is verifying.   It had marked the volumes 
UpToDate.   My question are these volumes really UpToDate or will I have to 
run drbdadm primary --force, or actually invalidate one then force the 
synchronization?


Thanks


James 
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Not able to test Automatic split brain recovery policies

2013-04-11 Thread Dan Barker
 -Original Message-
 From: Shailesh Vaidya [mailto:shailesh_vai...@persistent.co.in]
 Sent: Thursday, April 11, 2013 1:50 AM
 To: Digimer
 Cc: Dan Barker; drbd-user@lists.linbit.com
 Subject: RE: [DRBD-user] Not able to test Automatic split brain recovery
 policies
 
 Hi Digimer,
 
 Thanks for help and explanation. I will try it out fencing option.
 
 However, I would like to validate if what I am testing for split-brain is
 correct or not. Also what could be done for simple split-brain auto-
 recovery through configuration without fencing.
 

There is no simple split-brain recovery. Split Brain only occurs after an 
error of some sort causing two different nodes to write to the same resource 
while disconnected. Anything other than manual recovery of files or blocks will 
lose data. In many cases, it's not even possible to determine what data is 
being lost or how to recover it. You just have to pick the lesser of two evils 
and move forward, honoring the writes to one node and discarding the writes 
done on the other. Most applications and file systems react poorly to having 
writes of theirs discarded.

Any effort spent automating the recovery of a split-brain could better be spent 
identifying how your configuration created the split brain, usually dual 
primary without sufficient controls in place to prevent split brain in the 
first place.

ymmv

Dan

 Regards,
 Shailesh Vaidya
 
 
 -Original Message-
 From: Digimer [mailto:li...@alteeve.ca]
 Sent: Wednesday, April 10, 2013 11:17 PM
 To: Shailesh Vaidya
 Cc: Dan Barker; drbd-user@lists.linbit.com
 Subject: Re: [DRBD-user] Not able to test Automatic split brain recovery
 policies
 
 I've not done fencing in DRBD alone, so I am unable to offer specific
 suggestions. I can speak to generally what you need though.
 
 You can set DRBD's fencing policy to 'resource-and-stonith'. What this
 does is tell DRBD When you lose your peer, block IO and call a fence
 against it. The fence action reaches out (usually via IPMI or managed
 PDU) and forces the peer offline. After that, the surviving node will
 proceed. This way, at no time will both nodes be operating in StandAlone
 and Primary.
 
 You will want to set a delay so that one of the nodes has a head start
 when trying to fence the other. This way, in your test, when the
 communication breaks but the nodes are still up, you remove the risk of
 both nodes being fenced. What this does is say when you want to fence
 node 1, wait 15 seconds before doing so. when you want to fence node 2,
 don't wait and immediately fence. Thus, when it's a break in
 communications, you can predict which node will win the fence.
 
 When a node really fails, it will obviously not try to fence, being dead,
 so the healthy node will always win the fence and then take over.
 
 How you actually fence the peer will depends on what options you have
 available. Then you need a script that will actually do the work of
 reaching out and killing the peer. As I mentioned, this is usually done
 via IPMI (or branded out of band interfaces like iLO, DRAC, RSA, etc) or
 by using managed PDUs, like the APC AP7900. To do this, you need to have a
 scrip that reads certain environment variables set by DRBD, executes the
 request and then returns an appropriate exit code based on success or
 failure.
 
 I wrote such a fence handler called rhcs_fence (based on
 obliterate-peer.sh) which handles fencing by passing the request up to
 rhcs. You should be able to fairly easily adapt it to work with your
 setup.
 
 https://github.com/digimer/rhcs_fence
 
 Hope this helps clarify things.
 
 digimer
 
 On 04/10/2013 01:22 PM, Shailesh Vaidya wrote:
  Hi Don,
 
  Yup 8.3.8 is quit old but need to work with it for now.
 
  I am not using fencing and neither pacemaker or RHCS
 
  What I observed is that after split-brain its getting disconnected and
 dropped connection. both became unknown to each other. I am not sure is
 this issue with my test procedure itself.
 
  Do I need to make any additional configuration.
 
  Thanks,
  Shailesh Vaidya.
 
  
  From: Digimer [li...@alteeve.ca]
  Sent: Wednesday, April 10, 2013 8:11 PM
  To: Shailesh Vaidya
  Cc: Dan Barker; drbd-user@lists.linbit.com
  Subject: Re: [DRBD-user] Not able to test Automatic split brain recovery
 policies
 
  To your immediate problem;
 
  If you had configured fencing, drbd would not split-brain. Are you
  using pacemaker or RHCS?
 
  Secondly, 8.3.8 is very, very old. Upgrading to a newer 8.3.x version
  would be a good idea.
 
  Back to split-brain; DRBD declares a split-brain as soon as both nodes
  are StandAlone and Primary. To recover, you need to tell DRBD which
  node to consider good and then drop the changes on the peer and let
  the good node sync to the other node.
 
  On 04/10/2013 08:08 AM, Shailesh Vaidya wrote:
  I have followed same procedure (disable Ethernet card) etc and after
  that drbd status on both the nodes

Re: [DRBD-user] Not able to test Automatic split brain recovery policies

2013-04-10 Thread Dan Barker
You don't show the status of the nodes, but I imagine you have two primary 
nodes. There is no handler specified for two primary nodes. Did you have two 
primary, disconnected nodes?

It shouldn't be possible to create split brain without writing on both nodes.

Dan

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Shailesh Vaidya
Sent: Wednesday, April 10, 2013 1:58 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] Not able to test Automatic split brain recovery policies

Hello,

I am using DRBD 8.3.8

I have configured Automatic split brain recovery policies as below in 
/etc/drbd.conf

net {
max-buffers 2048;
ko-count 4;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
}

My both machines are Virtual machines so not connected actual back-to-back 
connection. To reproduce split-brain, I am using below procedure,

1.On Primary disable Ethernet card from 'Virtual Machine properties'
2.Wait to Secondery to start switch over and again enable Ethernet card on 
Primary

Log shows mw that split-brain is occurred , however its shows connection 
dropped.

Apr  9 10:30:15 drbd1 kernel: block drbd0: uuid_compare()=100 by rule 90
Apr  9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm 
initial-split-brain minor-0
Apr  9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm 
initial-split-brain minor-0 exit code 0 (0x0)
Apr  9 10:30:15 drbd1 kernel: block drbd0: Split-Brain detected but unresolved, 
dropping connection!
Apr  9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm 
split-brain minor-0
Apr  9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm 
split-brain minor-0 exit code 0 (0x0)
Apr  9 10:30:15 drbd1 kernel: block drbd0: conn( WFReportParams - 
Disconnecting )

Full DRBD conf file

[root@drbd1 ~]# cat /etc/drbd.conf
global {
usage-count no;
}

resource r0 {
protocol C;
#incon-degr-cmd echo !DRBD! pri on incon-degr | wall ; sleep 60 ; halt -f;

on drbd1 {
device /dev/drbd0;
disk   /dev/sda3;
address10.55.199.51:7789;
meta-disk  internal;
}
on drbd2 {
device/dev/drbd0;
disk  /dev/sda3;
address   10.55.199.52:7789;
meta-disk internal;
}

disk {
on-io-error   detach;
}

net {
max-buffers 2048;
ko-count 4;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
}

syncer {
rate 25M;
al-extents 257; # must be a prime number
}

startup {
wfc-timeout  20;
degr-wfc-timeout 120;# 2 minutes.
}
}

[root@drbd1 ~]# vi /var/log/messages
[root@drbd1 ~]#
[root@drbd1 ~]# cat /etc/drbd.conf
global {
usage-count no;
}

resource r0 {
protocol C;
#incon-degr-cmd echo !DRBD! pri on incon-degr | wall ; sleep 60 ; halt -f;

on drbd1 {
device /dev/drbd0;
disk   /dev/sda3;
address10.55.199.51:7789;
meta-disk  internal;
}
on drbd2 {
device/dev/drbd0;
disk  /dev/sda3;
address   10.55.199.52:7789;
meta-disk internal;
}

disk {
on-io-error   detach;
}

net {
max-buffers 2048;
ko-count 4;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
}

syncer {
rate 25M;
al-extents 257; # must be a prime number
}

startup {
wfc-timeout  20;
degr-wfc-timeout 120;# 2 minutes.
}
}

[root@drbd1 ~]#


Is this configuration issue or my testing procedure is not proper?

Regards,
Shailesh Vaidya


DISCLAIMER == This e-mail may contain privileged and confidential 
information which is the property of Persistent Systems Ltd. It is intended 
only for the use of the individual or entity to which it is addressed. If you 
are not the intended recipient, you are not authorized to read, retain, copy, 
print, distribute or use this message. If you have received this communication 
in error, please notify the sender and delete all copies of this message. 
Persistent Systems Ltd. does not accept any liability for virus infected mails.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] does drbd act like a loop device?

2013-03-27 Thread Dan Ringdahl

I have been using it this way for over a year now without issue.

On 03/27/2013 03:40 AM, Maurits van de Lande wrote:


Hello,

I'm would like to use a volume cached with flashcache as a drbd 
backing device with drbd 8.3. In order for flashcache to work there 
should not be a loop device in the storage path.


Like:

/dev/sdb1 regular disk raid6 partition

/dev/sdc1   SSD based raid 1 partition

The I use flashcache_create -p back cachedev /dev/sdc1 /dev/sdb1 to 
create the cached disk


Drbd will use /dev/mapper/cachedev

Does drbd act like a loop device? Has anybody used drbd in this way? 
(This is a different setup as Florian Haas used with drbd 8.4)


Best regards,

Maurits van de Lande



___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-25 Thread Dan Barker
 -Original Message-
 From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
 boun...@lists.linbit.com] On Behalf Of Stanislav German-Evtushenko
 Sent: Monday, March 25, 2013 10:13 AM
 To: Radu Radutiu
 Subject: Re: [DRBD-user] Uncatchable DRBD out-of-sync issue
 
 Thank you for suggestion.
 
 I could investigate if this is a swap region but it wouldn't help
 because:
 1) I can check it for Linux VMs but I can't do the same for Windows
 ones (because swap file is in file system).
 2) I can't do online migration even if only swap region is out of sync
 because it will make a VM unstable.
 
 On Mon, Mar 25, 2013 at 5:07 PM, Radu Radutiu rradu...@gmail.com wrote:
  I was asking if the out of sync blocks belong to a swap partition of
 one of
  the virtual machines. I see exactly the same problem with a setup more
 or
  less similar to your setup (my setup is an active-passive one, lvm on
 top
  dbrb, with kvm virtual machines using these logical volumes as
 storage).
  I seem to recall some older messages on drbd list stating that it might
 be
  OK to have oos blocks for the swap device.
 
  Best Regards,
 
  Radu
 
 
 

You don't need Dual-Primary to live migrate - you need shared storage. Two 
completely different concepts. Now, the shared storage can be based on a DRBD 
Primary, and you can have it fail over to a DRBD Secondary, but dual primary is 
not going to do what you want and shared storage will. The storage you share 
may be virtualized, if you like.

Dan
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Uncatchable DRBD out-of-sync issue

2013-03-24 Thread Dan Barker
Stanislav, my system sends me an email when verify finds an out-of-sync 
condition. You can use the same handler if you like.

In my global, handlers section:
out-of-sync  /usr/lib/drbd/notify-out-of-sync.sh myemailaddress;

Are you resyncing after the error is detected (disconnect/connect the resource)?

Dan, in Atlanta

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Stanislav 
German-Evtushenko
Sent: Sunday, March 24, 2013 7:00 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] Uncatchable DRBD out-of-sync issue

Dear all,

I'm trying to catch the issue with out-of-sync and I've stuck so far. Can 
anybody give me a hint what can I check next?

Configuration:
- two nodes Dell PowerEdge R710 (both nodes of the same hadrware, same 
configuration)
- drbd0 master-master (size is 900GiB)
- direct connection (two 1Gbit/s ethernet adapters in bonding balance-rr)
- data-integrity-alg is crc32c (it has been enabled for testing purposes)
- LVM on top of DRBD (LVM volumes are used by virtual machines)

Software:
- DRBD module version: 8.3.13
- kernel: Linux 2.6.32-19-pve #1 SMP x86_64 GNU/Linux

Problem:
- Each time when I do online verification it founds some sectors are out of 
sync (not many usually, about 5-15 messages after verification is done)
- In fact these sectors are not synced (checked with dd and md5sum)
- data-integrity-alg doesn't cause any messages in logs since drbdadm is 
connected all and until verification process finds some sectors out of sync

Questions:
- How is that possible?
- Why data-integrity-alg doesn't catch the problem?
- How to fix?

*** extracts from kernel log ***
Mar 24 13:23:38 host1 kernel: block drbd0: conn( Connected - VerifyS )
Mar 24 13:23:38 host1 kernel: block drbd0: Starting Online Verify from sector 0
Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718996928, size=8 
(sectors)
Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718996984, size=8 
(sectors)
Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718997224, size=8 
(sectors)
*

*** check with dd and md5sum ***
# dd iflag=direct if=/dev/drbd0 bs=512 skip=718997224 count=8 | md5sum
host1: 669a5c2ba22fa931aac16cdd2f03e22a
host2: ceeac3bd59178ee13f94ce283e3a4de3


*** drbdadm /dev/drbd0 show ***
disk {
size0s _is_default; # bytes
on-io-error pass_on _is_default;
fencing dont-care _is_default;
max-bio-bvecs   0 _is_default;
}
net {
timeout 60 _is_default; # 1/10 seconds
max-epoch-size  2048 _is_default;
max-buffers 2048 _is_default;
unplug-watermark128 _is_default;
connect-int 10 _is_default; # seconds
ping-int10 _is_default; # seconds
sndbuf-size 0 _is_default; # bytes
rcvbuf-size 0 _is_default; # bytes
ko-count0 _is_default;
allow-two-primaries;
cram-hmac-alg   sha1;
shared-secret   XXX;
after-sb-0pri   discard-zero-changes;
after-sb-1pri   discard-secondary;
after-sb-2pri   disconnect _is_default;
rr-conflict disconnect _is_default;
ping-timeout5 _is_default; # 1/10 seconds
data-integrity-alg  crc32c;
on-congestion   block _is_default;
congestion-fill 0s _is_default; # byte
congestion-extents  127 _is_default;
}
syncer {
rate153600k; # bytes/second
after   -1 _is_default;
al-extents  127 _is_default;
verify-alg  md5;
on-no-data-accessible   io-error _is_default;
c-plan-ahead0 _is_default; # 1/10 seconds
c-delay-target  10 _is_default; # 1/10 seconds
c-fill-target   0s _is_default; # bytes
c-max-rate  102400k _is_default; # bytes/second
c-min-rate  4096k _is_default; # bytes/second
}
protocol C;
_this_host {
device  minor 0;
disk/dev/sda3;
meta-disk   internal;
address ipv4 172.23.10.1:7788http://172.23.10.1:7788;
}
_remote_host {
address ipv4 172.23.10.2:7788http://172.23.10.2:7788;
}
# (89)  unknown tag = (integer) 0   [len: 4]
# Found unknown tags, you should update your
# userland tools
***

Best regards,
Stanislav
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD log files

2013-03-14 Thread Phillips, Dan
Q. Is there one central place in DRBD where the log files are setup? Stdout 
redirected from the screen to a log file? Where are all echo cmds going?

Thanks,

Dan
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] dump-md data

2013-03-11 Thread Dan Barker
The dump could be quite large if the bit map has a lot of non-zeros in it. As 
it is, all 9M times 64 bits times 4096 bytes/bit covers your 2+ terabyte 
metadata and all are zero so it takes one line. 8923456 times 
0x;

Take down your secondary, run for a few days, and then dump the md to get a 
respectably large dumpg.

[Don't]

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Olivier Le Cam
Sent: Monday, March 11, 2013 2:48 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] dump-md data

Hi -

By making a dump-md of a several TB device I expected to get a 
relatively large file. I realize that it is actually the opposite: the 
dump does only contain a dozen lines, like following.

# DRBD meta data dump
# 2013-03-11 18:38:21 +0100 [1363023501]
# nfs-2 drbdmeta 0 v08 /dev/vg1/storage internal dump-md
#

version v08;

# md_size_sect 139512
# md_offset 2339289165824
# al_offset 2339289133056
# bm_offset 2339217739776

uuid {
 0x7117E0379FF23460; 0x; 0x6273B7EE32734046; 
0x6272B7EE32734047;
 flags 0x0091;
}
# al-extents 3389;
la-size-sect 4568784648;
bm-byte-per-bit 4096;
device-uuid 0x64B2B985FCFD7314;
la-peer-max-bio-size 131072;
# bm-bytes 71387264;
bm {
# at 0kB
 8923456 times 0x;
}
# bits-set 0;

Is this normal or do I missed something?

PS: the dump-md drbdadm command requested that I first drbdadm apply-al 
before being able to dump de meta-data.

Thanks and best regards,
-- 
Olivier
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd pacemaker scst/srp 2 node active/passive question

2013-03-01 Thread Dan Barker
That's easy, I've been doing it for years, going back to ESXi 4.1 at least, 
maybe even to 4.0. I run ESXi 5.1 now.

Set up both the servers in ESXi, Configuration, Storage adapters. Use static 
discovery, because you can list the targets whether they exist or not. When the 
primary goes down, the secondary will come up (if it's available) on ESXi 
without intervention.

In my setup, the .46 drbd is secondary, and invisible to ESXi. .47 is primary 
and visible to ESXi. I run the following targets (you can do this with the GUI, 
but I get lazy):

vmkiscsi-tool -S -a 172.30.0.46 iqn.2012-05.com.visioncomm.DrbdR:Storage03 
vmhba39
vmkiscsi-tool -S -a 172.30.0.46 iqn.2012-06.com.visioncomm.DrbdR:Storage02 
vmhba39
vmkiscsi-tool -S -a 172.30.0.46 iqn.2012-08.com.visioncomm.DrbdR:Storage01 
vmhba39
vmkiscsi-tool -S -a 172.30.0.46 iqn.2012-08.com.visioncomm.DrbdR:Storage00 
vmhba39
vmkiscsi-tool -S -a 172.30.0.47 iqn.2012-05.com.visioncomm.DrbdR:Storage03 
vmhba39
vmkiscsi-tool -S -a 172.30.0.47 iqn.2012-06.com.visioncomm.DrbdR:Storage02 
vmhba39
vmkiscsi-tool -S -a 172.30.0.47 iqn.2012-08.com.visioncomm.DrbdR:Storage01 
vmhba39
vmkiscsi-tool -S -a 172.30.0.47 iqn.2012-08.com.visioncomm.DrbdR:Storage00 
vmhba39

If both are primary, I see 4 targets, 8 paths. This neverg happens. 
Usually, I see 4 targets, 4 paths.

I always do the switchover manually, so you might see slightly different 
results. My steps are:

 Go primary on the .46 server.

 Start the target (iscsi-target) software on the .46 server.

 Rescan on all ESXi.

 Stop the target software on the .47 server (ESXi fails over to the other path 
seamlessly at this point).

 Stop drbd on .47 and do whatever maintenance was necessary.

To reverse:

 The same steps, but you can skip the scan if the ESXi have seen both targets 
since boot.  One shows up as active and the other shows up as dead, but the VMs 
don't care.

hth

Dan in Atlanta

-Original Message-
From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Jason Thomas
Sent: Thursday, February 28, 2013 9:50 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] drbd pacemaker scst/srp 2 node active/passive question

First time posting to a mailing list hope I get this right.

I have a 2 node DRBD backed SCST/SRP single target(ib_srpt) setup working great 
using pacemaker/corosync.  I am using this for the data store for a mail 
server.  Where I am running into an issue is the initiator's are running on 
vmware ESXi 4.1 hosts, when a fail over occurs on the target the vm host 
initiators go dead and you have to rescan to pick up the target via the new 
path causing the vm guest to go down until the new path is discovered.

Hope that makes sense.

What I see as the potential problem is lvm and scst are only active on the 
primary node thus the secondary node is un-discoverable by ESXi host until it 
fails over.  I am not sure what the answer is but my thought process is I am 
trying to figure out if it is possible to have: 

1. on the node1 (primary node) drbd(primary), lvm, scst with the target in 
read/write mode 
2. on the node2 (secondary node) drbd(secondary), lvm, scst with the target in 
read mode 

and when the node1 fails over, node1 scst target goes ready only and node2 scst 
target would switch to read/write.  What I am trying to achieve is the the vm 
host seeing the target and paths at all times.

Hopefully there is an easier solution to this and that I am not making things 
more difficult.  I have been researching this for weeks and at the point of 
frustration.  Any guidance would be appreciated.

Side note: I modified SCSTTarget RA to work with ib_srpt as it was not written 
for it originally and did not find another RA out there specifically for my 
setup.

Thank you for any help you may be able to provide.

Setup:
Initiator machines vmware ESXi 4.1

Target machines
2 nodes running CentOS 2.6.32-279.19.1.el6.x86_64

DRBD:
kmod-drbd84-8.4.2-1.el6_3.elrepo.x86_64

Pacemaker/Corosync:
pacemaker-libs-1.1.7-6.el6.x86_64
pacemaker-cli-1.1.7-6.el6.x86_64
pacemaker-1.1.7-6.el6.x86_64
pacemaker-cluster-libs-1.1.7-6.el6.x86_64
corosync-1.4.1-7.el6_3.1.x86_64
corosynclib-1.4.1-7.el6_3.1.x86_64

SCST/SRPT:
scst-tools-2.6.32-279.19.1.el6-2.2.1-1.ab.x86_64
kernel-module-scst-iscsi-2.6.32-279.19.1.el6-2.2.1-1.ab.x86_64
kernel-module-scst-core-2.6.32-279.19.1.el6-2.2.1-1.ab.x86_64
kernel-module-scst-srpt-2.6.32-279.19.1.el6-2.2.1-1.ab.x86_64

scst config:

HANDLER vdisk_fileio {

DEVICE disk00 {
filename /dev/drbd-stor/mail-stor
nv_cache 1
}
}

TARGET_DRIVER ib_srpt {
TARGET 0002:c902:0020:2020 {
enabled 1
cpu_mask ff
rel_tgt_id 1

GROUP data {
LUN 0 disk00

INITIATOR 0x8102c902002020210002c903000f2bf3
INITIATOR 0x8102c902002020220002c903000f2bf3

Re: [DRBD-user] Device is held open by someone

2013-02-28 Thread Phillips, Dan
Seems this is a common problem!!! Can you point those of us experiencing  
Device is held open by someone  to a DRBD resource (documentation) that gives 
an overview in this area for a better understanding of what may be going on? I 
will look at www.drbd.org for info.


Thanks,

Dan Phillips

-Original Message-
From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Andreas Kurz
Sent: Thursday, February 28, 2013 5:07 AM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Device is held open by someone

On 2013-02-26 13:04, Felipe Gutierrez wrote:
 Hi everyone,
 
 I am trying to do a failover system only with drbd. When my primary 
 node get out of the network, the secondary node became primary and I 
 mount the filesystem.
 secondary# drbdadm primary r7
 secondary# mount /dev/drbd7 /mnt/drbd7/
 
 Until that every thing is ok.
 At this time, my old primary node has to became the secondary and I 
 have to discard my changes.
 primary# umount -l /mnt/drbd7
 primary# drbdadm secondary r7
 7: State change failed: (-12) Device is held open by someone Command 
 'drbdsetup 7 secondary' terminated with exit code 11 primary# drbdadm 
 -- --discard-my-data connect r7
 
 Does anyone have a hint?

It's always worth checking device-mapper:

dmsetup ls --tree -o inverted

Regards,
Andreas

--
Need help with DRBD?
http://www.hastexo.com/now

 Thnaks in advance!
 Felipe
 
 --
 *--
 -- Felipe Oliveira Gutierrez
 -- felipe.o.gutier...@gmail.com mailto:felipe.o.gutier...@gmail.com
 -- https://sites.google.com/site/lipe82/Home/diaadia*
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 





___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Device is held open by someone

2013-02-27 Thread Dan Barker
Well, that's who's got it open. Task 7354, 27005 and 27174. See which you may 
be able to stop or kill.

Dan

From: Felipe Gutierrez [mailto:felipe.o.gutier...@gmail.com]
Sent: Wednesday, February 27, 2013 11:46 AM
To: Dan Barker
Subject: Re: [DRBD-user] Device is held open by someone

root@cloud15:/home/cloud15# lsof | grep drbd
lsof: WARNING: can't stat() fuse.gvfs-fuse-daemon file system 
/home/cloud15/.gvfs
  Output information may be incomplete.
drbd7_wor  7354root  cwd   DIR8,2   4096
  2 /
drbd7_wor  7354root  rtd   DIR8,2   4096
  2 /
drbd7_wor  7354root  txt   unknown  
/proc/7354/exe
drbd7_rec 27005root  cwd   DIR8,2   4096
  2 /
drbd7_rec 27005root  rtd   DIR8,2   4096
  2 /
drbd7_rec 27005root  txt   unknown  
/proc/27005/exe
drbd7_ase 27174root  cwd   DIR8,2   4096
  2 /
drbd7_ase 27174root  rtd   DIR8,2   4096
  2 /
drbd7_ase 27174root  txt   unknown  
/proc/27174/exe


On Wed, Feb 27, 2013 at 1:28 PM, Dan Barker 
dbar...@visioncomm.netmailto:dbar...@visioncomm.net wrote:
And what did lsof | grep drbd say?

From: 
drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com]
 On Behalf Of Felipe Gutierrez
Sent: Wednesday, February 27, 2013 11:24 AM
To: Prater, James K.
Cc: drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com

Subject: Re: [DRBD-user] Device is held open by someone

Hi James,

even stoping Xen I couldn't umount my file system and set drbdadm secondary.

This is my output:

root@cloud15:/home/cloud15# umount /mnt/drbd7/
umount: /mnt/drbd7: device is busy.
(In some cases useful info about processes that use
 the device is found by lsof(8) or fuser(1))
root@cloud15:/home/cloud15# drbd-overview
  7:r7  StandAlone Primary/Unknown UpToDate/DUnknown r- /mnt/drbd7 ext3 23G 
8.3G 14G 39%


Any hint?
Thanks


On Wed, Feb 27, 2013 at 6:50 AM, Prater, James K. 
jpra...@draper.commailto:jpra...@draper.com wrote:
a separate system just for XEN. You are probably having some kernel based 
conflicts that is blocking the release of the volume(s).


From: Felipe Gutierrez 
[mailto:felipe.o.gutier...@gmail.commailto:felipe.o.gutier...@gmail.com]
Sent: Tuesday, February 26, 2013 04:57 PM
To: Arnold Krille arn...@arnoldarts.demailto:arn...@arnoldarts.de
Cc: drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com 
drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Device is held open by someone

Hi Arnold,

I will try to stop Xen.

Talking about stonith/fencing I was working with Corosync+Pacemaker+Xen+DRBD 
but the pace maker configurations got failed when I put all components 
together. I mean, when I was with Corosync+Pacemaker+DRBD the fencing worked 
well! After I put Xen together the pacemaker configuration got failed.

Now I am not using Corosyn+Pacemaker anymore :(

Do you have some clue to me about this?

Thanks in advance!
Felipe
On Tue, Feb 26, 2013 at 6:47 PM, Arnold Krille 
arn...@arnoldarts.demailto:arn...@arnoldarts.de wrote:
On Tue, 26 Feb 2013 09:43:55 -0300 Felipe Gutierrez
felipe.o.gutier...@gmail.commailto:felipe.o.gutier...@gmail.com wrote:
 No, it is not mount. it is why i did the option -l on umount

 primary# umount -l /mnt/drbd7

 I was saving files on this partition with Xen hypervisor.
 If I test the same thing with out Xen, everything works fine.
Well, then make xen stop when you have to switch-over the primary. Or
at least make xen stop using that directory. Could be its still running
vms from there, could be its only still looking at the dir because it
'could' run vms from there.
If you or your cluster-manager want to fail-over the resource and that
fails, its a case for stonith/fencing. Or a case for a manual reboot if
you haven't configured fencing yet.

 I just have to know how to force to make it secondary. For this time I
 rebbot the machine and I get to put to secondary. But I have to
 simulate it with out rebooting.
Have fun,

Arnold

___
drbd-user mailing list
drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user



--
--
-- Felipe Oliveira Gutierrez
-- felipe.o.gutier...@gmail.commailto:felipe.o.gutier...@gmail.com
-- https://sites.google.com/site/lipe82/Home/diaadia



--
--
-- Felipe Oliveira Gutierrez
-- felipe.o.gutier...@gmail.commailto:felipe.o.gutier...@gmail.com
-- https://sites.google.com/site/lipe82/Home/diaadia



--
--
-- Felipe Oliveira Gutierrez
-- felipe.o.gutier

Re: [DRBD-user] Device is held open by someone

2013-02-27 Thread Phillips, Dan
We are getting these two errors after a manual failover (fairly easy to 
recreate):

Feb 27 09:23:47 jamaica-a kernel: EXT3-fs warning: maximal mount count reached, 
running e2fsck is recommended
(we know the max count is 20 now.)


Feb 26 00:36:03 jamaica-a kernel: drbd0: State change failed: Device is held 
open by someone


Dan Phillips


From: Phillips, Dan
Sent: Wednesday, February 27, 2013 2:07 PM
To: Dan Barker; drbd List (drbd-user@lists.linbit.com)
Cc: Phillips, Dan; Felipe Gutierrez
Subject: RE: [DRBD-user] Device is held open by someone

On our system:

[root@jamaica-a ~]# lsof | grep drbd
drbd0_wor 14557  root  cwd   DIR9,2 1024  2 /
drbd0_wor 14557  root  rtd   DIR9,2 1024  2 /
drbd0_wor 14557  root  txt   unknown
/proc/14557/exe
drbd0_rec 28332  root  cwd   DIR9,2 1024  2 /
drbd0_rec 28332  root  rtd   DIR9,2 1024  2 /
drbd0_rec 28332  root  txt   unknown
/proc/28332/exe
drbd0_ase 28333  root  cwd   DIR9,2 1024  2 /
drbd0_ase 28333  root  rtd   DIR9,2 1024  2 /
drbd0_ase 28333  root  txt   unknown
/proc/28333/exe

HERE are the processes that correspond to task IDs 14557, 28332, 28333

[root@jamaica-a ~]# ps -aux | grep 14557
Warning: bad syntax, perhaps a bogus '-'? See 
/usr/share/doc/procps-3.2.7.2.7/FAQ
root 14557  0.0  0.0  0 0 ?S06:57   0:02 [drbd0_worker]
root 30919  0.0  0.0   3852   600 pts/3S+   13:54   0:00 grep 14557

[root@jamaica-a ~]# ps -aux | grep 28332
Warning: bad syntax, perhaps a bogus '-'? See 
/usr/share/doc/procps-3.2.7.2.7/FAQ
root 28332  0.0  0.0  0 0 ?S07:04   0:01 
[drbd0_receiver]
root 31384  0.0  0.0   3848   588 pts/3S+   13:54   0:00 grep 28332

[root@jamaica-a ~]# ps -aux | grep 28333
Warning: bad syntax, perhaps a bogus '-'? See 
/usr/share/doc/procps-3.2.7.2.7/FAQ
root 28333  0.0  0.0  0 0 ?S07:04   0:02 [drbd0_asender]
root 31451  0.0  0.0   3848   592 pts/3S+   13:54   0:00 grep 28333

So drbd0_worker, drbd0_receiver and drbd0_asender have files open. After a 
failover, should these processes still have a device(s) held open?

What does this tell us?

Thanks,

Dan


From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Dan Barker
Sent: Wednesday, February 27, 2013 12:39 PM
To: drbd List (drbd-user@lists.linbit.com)
Subject: Re: [DRBD-user] Device is held open by someone

Well, that's who's got it open. Task 7354, 27005 and 27174. See which you may 
be able to stop or kill.

Dan

From: Felipe Gutierrez [mailto:felipe.o.gutier...@gmail.com]
Sent: Wednesday, February 27, 2013 11:46 AM
To: Dan Barker
Subject: Re: [DRBD-user] Device is held open by someone

root@cloud15:/home/cloud15# lsof | grep drbd
lsof: WARNING: can't stat() fuse.gvfs-fuse-daemon file system 
/home/cloud15/.gvfs
  Output information may be incomplete.
drbd7_wor  7354root  cwd   DIR8,2   4096
  2 /
drbd7_wor  7354root  rtd   DIR8,2   4096
  2 /
drbd7_wor  7354root  txt   unknown  
/proc/7354/exe
drbd7_rec 27005root  cwd   DIR8,2   4096
  2 /
drbd7_rec 27005root  rtd   DIR8,2   4096
  2 /
drbd7_rec 27005root  txt   unknown  
/proc/27005/exe
drbd7_ase 27174root  cwd   DIR8,2   4096
  2 /
drbd7_ase 27174root  rtd   DIR8,2   4096
  2 /
drbd7_ase 27174root  txt   unknown  
/proc/27174/exe


On Wed, Feb 27, 2013 at 1:28 PM, Dan Barker 
dbar...@visioncomm.netmailto:dbar...@visioncomm.net wrote:
And what did lsof | grep drbd say?

From: 
drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com]
 On Behalf Of Felipe Gutierrez
Sent: Wednesday, February 27, 2013 11:24 AM
To: Prater, James K.
Cc: drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com

Subject: Re: [DRBD-user] Device is held open by someone

Hi James,

even stoping Xen I couldn't umount my file system and set drbdadm secondary.

This is my output:

root@cloud15:/home/cloud15# umount /mnt/drbd7/
umount: /mnt/drbd7: device is busy.
(In some cases useful info about processes that use
 the device is found by lsof(8) or fuser(1))
root@cloud15:/home/cloud15# drbd-overview
  7:r7  StandAlone Primary/Unknown UpToDate/DUnknown r- /mnt/drbd7 ext3 23G 
8.3G 14G 39%


Any hint?
Thanks


On Wed, Feb

Re: [DRBD-user] Device is held open by someone

2013-02-26 Thread Phillips, Dan
We have been working on same/similar Device is held open by someone issue for 
some time now. Occurs on fairly regular basis upon manual failover.


[root@jamaica-a logs]# tail -f /var/log/messages
Feb  8 04:59:35 jamaica-a kernel: ide: failed opcode was: unknown
Feb  8 04:59:35 jamaica-a kernel: drbd0: State change failed: Device is held 
open by someone
Feb  8 04:59:35 jamaica-a kernel: drbd0:   state = { cs:Connected 
st:Primary/Secondary ds:UpToDate/UpToDate r--- }
Feb  8 04:59:35 jamaica-a kernel: drbd0:  wanted = { cs:Connected 
st:Secondary/Secondary ds:UpToDate/UpToDate r--- }

Dan Phillips


From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Felipe Gutierrez
Sent: Tuesday, February 26, 2013 7:50 AM
To: Prater, James K.
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Device is held open by someone

Like this:

# cat /etc/drbd.d/r7.res
resource r7 {
  on cloud15 {
device /dev/drbd7;
disk /dev/vg_7/lv_7;
address 192.168.188.15:7789http://192.168.188.15:7789;
meta-disk internal;
  }
  on cloud16 {
device /dev/drbd7;
disk /dev/vg_7/lv_7;
address 192.168.188.16:7789http://192.168.188.16:7789;
meta-disk internal;
  }
}
On Tue, Feb 26, 2013 at 9:49 AM, Prater, James K. 
jpra...@draper.commailto:jpra...@draper.com wrote:
How are the drbd volume(s) used?


James
From: 
drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com]
 On Behalf Of Felipe Gutierrez
Sent: Tuesday, February 26, 2013 7:05 AM
To: drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com
Subject: [DRBD-user] Device is held open by someone

Hi everyone,

I am trying to do a failover system only with drbd. When my primary node get 
out of the network, the secondary node became primary and I mount the 
filesystem.
secondary# drbdadm primary r7
secondary# mount /dev/drbd7 /mnt/drbd7/

Until that every thing is ok.
At this time, my old primary node has to became the secondary and I have to 
discard my changes.
primary# umount -l /mnt/drbd7
primary# drbdadm secondary r7
7: State change failed: (-12) Device is held open by someone
Command 'drbdsetup 7 secondary' terminated with exit code 11
primary# drbdadm -- --discard-my-data connect r7

Does anyone have a hint?
Thnaks in advance!
Felipe

--
--
-- Felipe Oliveira Gutierrez
-- felipe.o.gutier...@gmail.commailto:felipe.o.gutier...@gmail.com
-- https://sites.google.com/site/lipe82/Home/diaadia



--
--
-- Felipe Oliveira Gutierrez
-- felipe.o.gutier...@gmail.commailto:felipe.o.gutier...@gmail.com
-- https://sites.google.com/site/lipe82/Home/diaadia
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Expanding a cluster

2013-01-31 Thread Dan Barker
Justin: I would suggest:

Swap ALL drives in one server with 2T drives, build the new RAID array, let 
that sync.

You have a backup in the 1T drives you pulled.

Ditto for the Primary.

You miss the rebuild, you only do the sync. A rebuild reads EVERY sector, 
regardless of whether it's in use; just asking for a failure on that many 
drives - and - you want to do that 32 times! Please don't.

The only exposure in doing all 16 drives at one time is that there is a single 
copy of any changes that take place after you disconnect the servers until the 
sync completes. If a catastrophe occurs during that period, you have the 
original 16 drives as a fall back.

Another issue is you miss the opportunity to reorg into two, 8-drive arrays as 
Adam suggests. Hey, I bet all your current data will fit onto 8, 2T drives. You 
could do both at the same time.

Disconnect, pull 16 1T, add 16 2T, build 2 arrays of 8 drives each, sync drbd 
to only one of them. Switch to the other server, repeat on the first, and then 
migrate at your leisure half of the load from the first 8-disk array to the 
second.

Dan top poster in Atlanta

-Original Message-
From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Marcelo Pereira
Sent: Thursday, January 31, 2013 3:32 PM
To: Adam Goryachev; Justin Edmands
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Expanding a cluster

Hi Adam,

I'm sorry but it wasn't supposed to be an off-topic. I have been checking
all the phases of this process, and RAID is something I was checking as
well.

What I wanted to know was really the DRBD side, as I know that this
expansion will affect the block numbers and so on. That is why I wanted to
know if DRBD would handle it okay, and how!

Thanks ALL, for the messages! I will check the version numbers and publish
the results here. And I will RTM.

Thanks again,

--Marcelo

On 1/31/13 12:27 PM, Adam Goryachev
mailingli...@websitemanagers.com.au wrote:

On 01/02/13 04:04, Justin Edmands wrote:
 I'm on the fence about the amount of time it will take to degrade and
 rebuild a RAID6 at 16 drives (x2 systems).
 
 Anyone against the idea of:
 Backup data friday night through saturday morning
 stop drbd and heartbeat on node2
 replace all drives on node2
 build raid 6 and match setup/sizes from node1
 initialize metadata, etc.
 start drbd and heartbeat
 let it sync
 make node2 primary
 repeat steps for node1

In theory, the set of drives you pulled from the secondary are an extra
backup you could put all those drives back in, and make that set the
primary In some ways this might be a better solution, since you are
then simply doing a single large read on the primary, and a large write
on the secondary no raid rebuilds, except for the initial resync on
the secondary (which you might be able to skip since you know you will
write to every sector very soon when drbd does the sync).

1) Stop DRBD on secondary
2) Pull all drives on secondary
3) Add all drives on secondary and build new RAID6 array
4) Enable DRBD on secondary
5) sync from primary to secondary

Danger of read errors on the primary during this sync, but I would guess
this is better than doing 16 rebuild's

Personally, I would try to set the primary read-only during the process
(if an option) so that the spare set of drives is an exact match to
the primary (ie, they don't get outdated).

Depends on how much downtime can be scheduled

Finally, I think you have a fairly high risk with 16 drives in a single
RAID6, you might consider 2 sets of 8 drives in RAID6, and do a linear
concat of the two sets (or raid0). That allows you to lose any 2 out of
8 drives, instead of only 2 out of 16. Also, chances of URE on just one
of the remaining 14 drives after a 2 drive failure is not a good risk I
would want. Though depends on capacity requirements if you can use
another 2 drives to ensure you don't lose the data.

Just my 0.02c worth

At the end of the day, the direct answer to the original question was
RTFM, it really is a very nice manual, and you didn't tell us what
version of DRBD you use. The rest is really off-topic for this list,
maybe discuss on the linux-raid list if you are interested.

Regards,
Adam

 On Thu, Jan 31, 2013 at 11:20 AM, Adam Goryachev
 mailingli...@websitemanagers.com.au
 mailto:mailingli...@websitemanagers.com.au wrote:
 
 On 01/02/13 02:58, Marcelo Pereira wrote:
 Hello Everyone,

 I'm about to perform an upgrade on my servers and I was wondering
 how to do that.

 Here is the scenario:

 Server A has 16x 1Tb hard drives, under RAID-6.
 Server B has 16x 1Tb hard drives, under RAID-6.

 And both are in sync, using DRBD.

 I though about replacing the hard drives for 2Tb units, one by one.

 So, on each run, I would:

   * Remove a 1Tb disk
   * Add a 2Tb disk
   * Wait for it to rebuild the RAID

 After replacing ALL disks, I would expand the RAID unit

Re: [DRBD-user] bad side-effect: bug in inactive config stops other resources

2013-01-23 Thread Dan Barker
If those time stamps are believable, then the problem occurred before you 
started!

I don't know of drbd accessing the config decks before a drbdadm command, but 
it looks like you have proved that it happens. You copied, it activated! Very 
strange result. I'll let the drbd authors chime in on what new files in the 
config directory can cause before you request something using them, but it 
looks like the way to do this is to do the edit elsewhere, or Save As (:w 
filename).

Actually, you have pacemaker and heartbeat in the mix. I imagine they do watch 
directories for changes. It may not be drbd at all, but drbdadm commands 
requested by those tools.

Thanks for the warning!

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Helmut Wollmersdorfer
Sent: Wednesday, January 23, 2013 12:02 PM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] bad side-effect: bug in inactive config stops other 
resources



Am 23.01.2013 um 16:39 schrieb Dan Barker:

 conflicting use of resource section 'drbd8_1'

 Looks like you forgot to edit the config sections properly in the  
 vim ... step. Do the drbd10 decks still say drbd8?

Look at the time-stamps:

  1351  [2013-01-08 - 15:28:56] cd /etc/drbd.d
 # my usual way to configure new drbd-resources
  1352  [2013-01-08 - 15:29:29] cp -a drbd8_1.res drbd10_1.res

15:29:29

  1353  [2013-01-08 - 15:29:39] cp -a drbd8_2.res drbd10_2.res

15:29:39



 Jan  8 15:29:31 xen11 lrmd: [2403]: info: RA output:
 (xen_drbd5_1:1:monitor:stderr) drbd.d/drbd8_1.res:1: conflicting use
 of resource section 'drbd8_1' ...#012drbd.d/drbd10_1.res:1: resource
 section 'drbd8_1' first used here.

15:29:31

[...]

 Jan  8 15:29:37 xen11 Xen[32084]: INFO: Xen domain www will be stopped
 (timeout: 26s)
 Jan  8 15:29:37 xen11 Xen[32089]: INFO: Xen domain mail4 will be
 stopped (timeout: 26s)
 Jan  8 15:29:37 xen11 Xen[32086]: INFO: Xen domain typo3 will be
 stopped (timeout: 26s)
 # --

15:29:37

The resources stopped *before* the vim step, even 2 seconds before the  
2nd cp, 8 seconds after the 1st cp.

Helmut Wollmersdorfer

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Diagnosing a Failed Resource

2013-01-22 Thread Dan Barker
 However, I still have no idea what caused the failures.

A split brain is caused by writing to both members while they are disconnected. 
What in your environment caused that to occur is probably lost in logs a week 
gone. But, if your procedures always allow only one node (primary) to write to 
a resource, even if it’s disconnected, then split-brain won’t occur.

“nuke the whole thing” certainly worked. So would have following the doc to 
invalidate the secondary copy and then simply connect. There is an excellent 
chapter in the manual about split-brain.

Dan

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Eric
Sent: Monday, January 21, 2013 5:08 PM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Diagnosing a Failed Resource

I decided to nuke the whole thing and start over:

On both nodes, I...

snip

However, I still have no idea what caused the failures.

Ideas? Suggestions?

Eric Pretorious
Truckee, CA

bigsnip
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Diagnosing a Failed Resource

2013-01-21 Thread Dan Barker
The errors in connecting are logged. If you can't find them, attempt to connect 
a resource (drbdadm connect r1, for example) to create the errors again, and 
then look at the logs for the reason the connection was not established. The 
status will continue to show waiting for connection (WFC) but there will be a 
reason in the log files. If the logs are unclear, post the relevant portions 
back here and we'll help.

Something like 'dmesg | grep drbd'. You may want to do the logs on both drbd 
servers. You can do the connect command on either.

hth

Dan

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Eric
Sent: Monday, January 21, 2013 1:24 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] Diagnosing a Failed Resource

I've configured corosync+pacemaker to managee a simple two-resource DRBD 
cluster:

 san1:~ # crm configure show | cat -
 node san1 \
 attributes standby=off
 node san2 \
 attributes standby=off
 primitive p_DRBD-r0 ocf:linbit:drbd \
 params drbd_resource=r0 \
 op monitor interval=60s
 primitive p_DRBD-r1 ocf:linbit:drbd \
 params drbd_resource=r1 \
 op monitor interval=60s
 primitive p_IP-1_253 ocf:heartbeat:IPaddr2 \
 params ip=192.168.1.253 cidr_netmask=24 \
 op monitor interval=30s
 primitive p_IP-1_254 ocf:heartbeat:IPaddr2 \
 params ip=192.168.1.254 cidr_netmask=24 \
 op monitor interval=30s
 primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget \
 params iqn=iqn.2012-11.com.example.san1:sda \
 op monitor interval=10s
 primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit \
 params target_iqn=iqn.2012-11.com.example.san1:sda lun=0 
 path=/dev/drbd0 \
 op monitor interval=10s
 primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit \
 params target_iqn=iqn.2012-11.com.example.san1:sda lun=1 
 path=/dev/drbd1 \
 op monitor interval=10s
 primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit \
 params target_iqn=iqn.2012-11.com.example.san1:sda lun=2 
 path=/dev/drbd2 \
 op monitor interval=10s
 primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit \
 params target_iqn=iqn.2012-11.com.example.san1:sda lun=3 
 path=/dev/drbd3 \
 op monitor interval=10s
 primitive p_iSCSI-san2 ocf:heartbeat:iSCSITarget \
 params iqn=iqn.2012-11.com.example.san2:sda \
 op monitor interval=10s
 primitive p_iSCSI-san2_0 ocf:heartbeat:iSCSILogicalUnit \
 params target_iqn=iqn.2012-11.com.example.san2:sda lun=0 
 path=/dev/drbd1000 \
 op monitor interval=10s
 primitive p_iSCSI-san2_1 ocf:heartbeat:iSCSILogicalUnit \
 params target_iqn=iqn.2012-11.com.example.san2:sda lun=1 
 path=/dev/drbd1001 \
 op monitor interval=10s
 primitive p_iSCSI-san2_2 ocf:heartbeat:iSCSILogicalUnit \
 params target_iqn=iqn.2012-11.com.example.san2:sda lun=2 
 path=/dev/drbd1002 \
 op monitor interval=10s
 primitive p_iSCSI-san2_3 ocf:heartbeat:iSCSILogicalUnit \
 params target_iqn=iqn.2012-11.com.example.san2:sda lun=3 
 path=/dev/drbd1003 \
 op monitor interval=10s
 group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 
 p_iSCSI-san1_3 p_IP-1_254
 group g_iSCSI-san2 p_iSCSI-san2 p_iSCSI-san2_0 p_iSCSI-san2_1 p_iSCSI-san2_2 
 p_iSCSI-san2_3 p_IP-1_253
 ms ms_DRBD-r0 p_DRBD-r0 \
 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 
 notify=true
 ms ms_DRBD-r1 p_DRBD-r1 \
 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 
 notify=true
 location l_iSCSI-san1_and_DRBD-r0 p_IP-1_254 10240: san1
 location l_iSCSI-san2_and_DRBD-r1 p_IP-1_253 10240: san2
 colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master
 colocation c_iSCSI_with_DRBD-r1 inf: g_iSCSI-san2 ms_DRBD-r1:Master
 order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start
 order o_DRBD-r1_before_iSCSI-san2 inf: ms_DRBD-r1:promote g_iSCSI-san2:start
 property $id=cib-bootstrap-options \
 dc-version=1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf \
 cluster-infrastructure=openais \
 expected-quorum-votes=2 \
 stonith-enabled=false \
 no-quorum-policy=ignore

The cluster appears to be functioning correctly:

 san1:~ # crm_mon -1
 
 Last updated: Sun Jan 20 22:20:17 2013
 Last change: Sun Jan 20 21:59:15 2013 by root via crm_attribute on san1
 Stack: openais
 Current DC: san1 - partition with quorum
 Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
 2 Nodes configured, 2 expected votes
 16 Resources configured.
 

 Online: [ san1 san2 ]

  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
  Masters: [ san1 ]
  Slaves: [ san2 ]
  Resource Group: g_iSCSI-san1
  p_iSCSI-san1(ocf::heartbeat:iSCSITarget):Started san1
  p_iSCSI-san1_0(ocf::heartbeat:iSCSILogicalUnit):Started san1
  p_iSCSI-san1_1(ocf::heartbeat:iSCSILogicalUnit):Started san1
  p_iSCSI-san1_2(ocf::heartbeat:iSCSILogicalUnit):Started san1
  p_iSCSI-san1_3

Re: [DRBD-user] DRDB stalled and impossible restart, down...

2013-01-13 Thread Dan Barker
I have no idea what the problem might be. But I have an idea to un-hang drbd. 
If you go on the primary node and disconnect the resource (drbdadm r1 
disconnect), maybe the processes on the secondary will respond. Saves a boot. 

Are you certain about the reliability of the network layer between the drbd 
hosts?

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Abdelkarim Mateos 
Sanchez
Sent: Sunday, January 13, 2013 2:18 AM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] DRDB stalled and impossible restart, down...

Hi.

Any reply for this question?

I'm desolate.

In this machine, every week I need reboot server because DRBD it's hung down.

Example.

cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted, 
2012-10-09 12:47:51
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-
ns:0 nr:0 dw:0 dr:335534008 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 1: cs:VerifyS ro:Secondary/Primary ds:UpToDate/UpToDate C r-
ns:0 nr:1309972 dw:1309972 dr:51448776 al:0 bm:88 lo:1 pe:136721 ua:2048 
ap:0 ep:1 wo:b oos:9459536
[] verified:  4.4% (48996/51196)M
finish: 16317:48:56 speed: 0 (0) want: 40,960 K/sec (stalled)
 2: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-
ns:0 nr:0 dw:0 dr:209708728 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b 
oos:28051588
 3: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-
ns:0 nr:0 dw:0 dr:209708728 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b 
oos:90437120
root@pro01:~# /sbin/drbdadm verify all

Command '/sbin/drbdsetup 0 verify' did not terminate within 5 seconds
root@pro01:~# 
root@pro01:~# No response from the DRBD driver! Is the module loaded?

I like shutdown drbd, can't do it.
I like detach r1, can't do it..

Desolate.




El 11/01/2013, a las 10:36, Abdelkarim Mateos Sanchez abk...@tamainut.com 
escribió:

 Hi.
 
 I'm desolate.
 
 With DRBD 8.3 (latest minor version) on Proxmox 2.2 r1.res stalled
 
 at /proc/drbd 
 version: 8.3.13 (api:88/proto:86-96)
 GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted, 
 2012-10-09 12:47:51
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-
   ns:0 nr:0 dw:44628 dr:335534008 al:0 bm:39 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b 
 oos:0
 1: cs:VerifyS ro:Secondary/Primary ds:UpToDate/UpToDate C r-
   ns:0 nr:52427164 dw:52427164 dr:3246072 al:0 bm:3200 lo:1 pe:145893 ua:2048 
 ap:0 ep:1 wo:b oos:1309972
   [...] verified:  6.2% (48036/51196)M
   finish: 755:24:58 speed: 16 (96) want: 40,960 K/sec (stalled)
 2: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-
   ns:0 nr:23024700 dw:127879064 dr:104854364 al:0 bm:8866 lo:0 pe:0 ua:0 ap:0 
 ep:1 wo:b oos:0
 3: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-
   ns:0 nr:79852364 dw:184706728 dr:104854364 al:0 bm:11415 lo:0 pe:0 ua:0 
 ap:0 ep:1 wo:b oos:0
 
 
 I like disconnect, down resource, any solution for this situation.
 
 Bat all get a timeout
 
 cat /var/lock/drbd-147-1
 lock on /var/lock/drbd-147-1 currently held by pid:591161
 State change failed: (0)unknown error.
 change failed: (0)unknown error.
 
 service drbd restart
 Stopping all DRBD resources:
 
 No response from the DRBD driver! Is the module loaded?
 
 No response from the DRBD driver! Is the module loaded?
 
 But---
 lsmod | grep drbd
 drbd  342496  13
 
 
 Dec 31 17:52:31 pro01 kernel: block drbd1: [drbd1_worker/20189] sock_sendmsg 
 time expired, ko = 4294961767
 Dec 31 17:52:37 pro01 kernel: block drbd1: [drbd1_worker/20189] sock_sendmsg 
 time expired, ko = 4294961766
 Dec 31 17:52:43 pro01 kernel: block drbd1: [drbd1_worker/20189] sock_sendmsg 
 time expired, ko = 4294961765
 Dec 31 17:52:49 pro01 kernel: block drbd1: [drbd1_worker/20189] sock_sendmsg 
 time expired, ko = 4294961764
 Dec 31 17:52:55 pro01 kernel: block drbd1: [drbd1_worker/20189] sock_sendmsg 
 time expired, ko = 4294961763
 Dec 31 17:53:01 pro01 kernel: block drbd1: [drbd1_worker/20189] sock_sendmsg 
 time expired, ko = 4294961762
 
 
 Try kill process, not work
 
 ps aux |grep  drbd1
 root   20189  0.0  0.0  0 0 ?SDec28   0:17 
 [drbd1_worker]
 root   20207  0.0  0.0  0 0 ?SDec28   3:21 
 [drbd1_receiver]
 root   20213  0.0  0.0  0 0 ?SDec28   0:16 
 [drbd1_asender]
 
 Apreciate help
 
 
 Abdelkarim Mateos Sánchez
 CEO Tamainout Hébergement, S.A.R.L. (Marruecos)
 CET Tamainut IT, S.L. (España) 
 Contacto | abk...@tamainut.com | Skype - mamateos 
 Teléfono Fijo España: +34.851000209 | Marruecos Móvil: +212.671819412
 islaserver.com | tamainut.tel 
 Este mensaje se dirige exclusivamente a su destinatario y puede contener 
 información privilegiada o confidencial. Si no es vd. el destinatario 
 indicado, queda notificado de que la

Re: [DRBD-user] Too many block drbdX: Out of sync

2013-01-12 Thread Dan Barker
 the execution of /sbin/drbdadm verify all errors are corrected.

This is incorrect. Verifyall errors are identified. They are not corrected. To 
correct them, disconnect and reconnect. They are corrected at connect time.

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Abdelkarim Mateos 
Sanchez
Sent: Friday, January 11, 2013 4:36 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] Too many block drbdX: Out of sync

Hi

I use DRBD Proxmox VE 8.3 with 2.2 and a configuration Primary / Secondary on 
network computers trajeta Gigabyte

Basically I use it to keep a copy of the LV of the primary machine.

When I run, /sbin/drbdadm verify all I find that there are many messages like 
this::

Dec 31 09:18:04 pro01 kernel: block drbd2: Out of sync: start=147170512, 
size=248 (sectors)
Dec 31 09:18:04 pro01 kernel: block drbd2: Out of sync: start=147195176, 
size=2504 (sectors)
Dec 31 09:18:05 pro01 kernel: block drbd2: Out of sync: start=147218992, 
size=11472 (sectors)
Dec 31 09:18:05 pro01 kernel: block drbd2: Out of sync: start=147230472, 
size=104 (sectors)
Dec 31 09:18:05 pro01 kernel: block drbd2: Out of sync: start=147231992, 
size=1200 (sectors)
Dec 31 09:18:05 pro01 kernel: block drbd2: Out of sync: start=147233280, 
size=96 (sectors)
Dec 31 09:18:05 pro01 kernel: block drbd2: Out of sync: start=147233408, 
size=16 (sectors)
Dec 31 09:18:05 pro01 kernel: block drbd2: Out of sync: start=147233560, 
size=32 (sectors)
Dec 31 09:18:07 pro01 kernel: block drbd1: [drbd1_worker/20189] sock_sendmsg 
time expired, ko = 4294966911
Dec 31 09:18:09 pro01 kernel: block drbd2: Out of sync: start=147493960, size=8 
(sectors)

Use in Global Configuration

 syncer {
   # rate after al-extents use-rle cpu-mask verify-alg csums-alg


 
  verify-alg sha1;
  rate 40M;
   }

In each resource

resource r1 {
 # kvm420
 protocol C;
 startup {
   wfc-timeout  15; # non-zero wfc-timeout can be dangerous 
(http://forum.proxmox.com/threads/3465-Is-it-safe-to-use-wfc-timeout-in-DRBD-configuration)
   degr-wfc-timeout 60;
 }
 net {
   cram-hmac-alg sha1;
   shared-secret CDwCYfY7420s;
   after-sb-0pri discard-zero-changes;
   after-sb-1pri discard-secondary;
   after-sb-2pri disconnect;
 }
 on pro01 {
   device /dev/drbd1;
   disk /dev/sata/vm-420-disk-1;
   address XXX.XXX.XXX.XXX:7789;
   meta-disk internal;
 }
 on pro02 {
   device /dev/drbd1;
   disk   /dev/pve/vm-420-disk-1;
   address XXX.XXX.XXX.XXX:7789;
   meta-disk internal;
 }
}


Nor do I understand correctly, if the execution of /sbin/drbdadm verify all 
errors are corrected.

Help is appreciated, as I am somewhat new to the suod and DRBD and I apologize 
for my English of Google translator.

A happy end of year...









resource r1 {
 # kvm420
 protocol C;
 startup {
   wfc-timeout  15; # non-zero wfc-timeout can be dangerous 
(http://forum.proxmox.com/threads/3465-Is-it-safe-to-use-wfc-timeout-in-DRBD-configuration)
   degr-wfc-timeout 60;
 }
 net {
   cram-hmac-alg sha1;
   shared-secret CDwCYfY7420s;
   after-sb-0pri discard-zero-changes;
   after-sb-1pri discard-secondary;
   after-sb-2pri disconnect;
 }
 on pro01 {
   device /dev/drbd1;
   disk /dev/sata/vm-420-disk-1;
   address XXX.XXX.XXX.XXX:7789;
   meta-disk internal;
 }
 on pro02 {
   device /dev/drbd1;
   disk   /dev/pve/vm-420-disk-1;
   address XXX.XXX.XXX.XXX:7789;
   meta-disk internal;
 }
}


Abdelkarim Mateos Sánchez
CEO Tamainout Hébergement, S.A.R.L. (Marruecos)
CET Tamainut IT, S.L. (España) 
Contacto | abk...@tamainut.com | Skype - mamateos 
Teléfono Fijo España: +34.851000209 | Marruecos Móvil: +212.671819412
islaserver.com | tamainut.tel 
Este mensaje se dirige exclusivamente a su destinatario y puede contener 
información privilegiada o confidencial. Si no es vd. el destinatario indicado, 
queda notificado de que la utilización, divulgación y/o copia sin autorización 
está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje 
por error, le rogamos que nos lo comunique inmediatamente por esta misma vía y 
proceda a su destrucción. 
This message is intended exclusively for its addresse and may contain 
information that is CONFIDENTIAL and protected by professional privilege. If 
you are not the intended recipient you are hereby notified that any 
dissemination, copy or disclosure of this communication is strictly prohibited 
by law. If this message has been received in error, please immediately notify 
us via e-mail and delete it.

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] drbd0: State change failed: Device is held open by someone

2012-11-15 Thread Phillips, Dan
Problem:

The problem is that when performing an HA failover from server A to server B,  
a DRBD resource is sometimes not shut down properly on server A. Several 
attempts are made to stop the DRBD resource, but finally it gives up and the 
server is rebooted. The failover to server B works properly; B becomes the 
Active server. After the reboot, server A comes up properly as the Standby 
server.

The problem is intermittent. Most HA failovers work as expected (server A does 
not reboot).

When the problem does occur, the following lines are logged in 
/var/log/messages and displayed on the OOBM:

drbd0: State change failed: Device is held open by someone
drbd0:   state = { cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate r--- }
drbd0:  wanted = { cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate 
r--- }

Heartbeat: 2.1.4

Drbd: 8.0.11

kernel-module-drbd: 2.6.18

lvm2: 2.02.42-5


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] very large out-of-sync (oos) value yet drbd-overview claims UpToDate/UpToDate

2012-11-01 Thread Dan Barker
There is an on-error event handler. Mine sends me email if verify fails
(runs weekly, one resource each of M, Tu, W, Th nights).

Dan

In my Global handlers section:

out-of-sync  /usr/lib/drbd/notify-out-of-sync.sh myemail;



-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Lonni J Friedman
Sent: Wednesday, October 31, 2012 6:02 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] very large out-of-sync (oos) value yet drbd-overview
claims UpToDate/UpToDate

I've got a drbd setup with 8.3.11.  I ran a manual verify, and once it
completed it reported:

[23479.620066] block drbd0: Online verify  done (total 23136 sec;
paused 0 sec; 73748 K/sec)
[23479.702176] block drbd0: Online verify found 9651098 4k block out of
sync!
[23479.745988] block drbd0: conn( VerifyT - Connected )
[23479.788996] block drbd0: helper command: /sbin/drbdadm out-of-sync
minor-0
[23479.839348] block drbd0: helper command: /sbin/drbdadm out-of-sync
minor-0 exit code 0 (0x0)
[23479.961245] block drbd0: bitmap WRITE of 2763 pages took 34 jiffies
[23480.006527] block drbd0: 37 GB (9651098 bits) marked out-of-sync by
on disk bit-map.

This isn't entirely surprising, as the secondary node was down for a
long time due to hardware problems.  However, what is surprising is
that drbd-overview still reports that everything is UpToDate:
$ drbd-overview
  0:sdb  Connected Secondary/Primary UpToDate/UpToDate C r-

Shouldn't this huge number of out of sync bits cause drbd-overview to
report something other than UpToDate for the Secondary node?   If not,
then how does one actually programattically detect that a verification
has failed?  Parsing dmesg is going to be a huge kludge, and not
likely to be reliable.

thanks
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] very large out-of-sync (oos) value yet drbd-overview claims UpToDate/UpToDate

2012-11-01 Thread Dan Barker
I don't know anything about drbd-overview, I just cat /proc/drbd.

But, I bet it's echoing the same information.

drbd keeps all the bytes in sync that it knows about (UpToDate). The changes it 
doesn't know about are found by verify. Disconnect/Connect syncs them back up.

If you start with dirty disks, set up drbd and do not sync them, and mkfs a 
file system on the primary, the disks will be absolutely UpToDate in the blocks 
that matter for the file system, and horribly out of sync in the blocks that 
don't matter to anybody at all. Verify will find the oos blocks and mark them 
for syncing, but the hypothetical file system is still consistent.

Just do the Disconnect/Connect and you'll have oos zero AND UpToDate.

Dan

-Original Message-
From: Lonni J Friedman [mailto:netll...@gmail.com] 
Sent: Thursday, November 01, 2012 4:31 PM
To: Dan Barker
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] very large out-of-sync (oos) value yet drbd-overview 
claims UpToDate/UpToDate

Thanks, that answers my 2nd question, but not my 1st question.
Shouldn't drbd-overview be treating this as a not UpToDate scenario?

On Thu, Nov 1, 2012 at 6:08 AM, Dan Barker dbar...@visioncomm.net wrote:
 There is an on-error event handler. Mine sends me email if verify 
 fails (runs weekly, one resource each of M, Tu, W, Th nights).

 Dan

 In my Global handlers section:

 out-of-sync  /usr/lib/drbd/notify-out-of-sync.sh myemail;



 -Original Message-
 From: drbd-user-boun...@lists.linbit.com
 [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Lonni J 
 Friedman
 Sent: Wednesday, October 31, 2012 6:02 PM
 To: drbd-user@lists.linbit.com
 Subject: [DRBD-user] very large out-of-sync (oos) value yet 
 drbd-overview claims UpToDate/UpToDate

 I've got a drbd setup with 8.3.11.  I ran a manual verify, and once it 
 completed it reported:

 [23479.620066] block drbd0: Online verify  done (total 23136 sec; 
 paused 0 sec; 73748 K/sec) [23479.702176] block drbd0: Online verify 
 found 9651098 4k block out of sync!
 [23479.745988] block drbd0: conn( VerifyT - Connected ) 
 [23479.788996] block drbd0: helper command: /sbin/drbdadm out-of-sync
 minor-0
 [23479.839348] block drbd0: helper command: /sbin/drbdadm out-of-sync
 minor-0 exit code 0 (0x0)
 [23479.961245] block drbd0: bitmap WRITE of 2763 pages took 34 jiffies 
 [23480.006527] block drbd0: 37 GB (9651098 bits) marked out-of-sync by 
 on disk bit-map.

 This isn't entirely surprising, as the secondary node was down for a 
 long time due to hardware problems.  However, what is surprising is 
 that drbd-overview still reports that everything is UpToDate:
 $ drbd-overview
   0:sdb  Connected Secondary/Primary UpToDate/UpToDate C r-

 Shouldn't this huge number of out of sync bits cause drbd-overview to
 report something other than UpToDate for the Secondary node?   If not,
 then how does one actually programattically detect that a verification 
 has failed?  Parsing dmesg is going to be a huge kludge, and not 
 likely to be reliable.

 thanks

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Versions

2012-10-24 Thread Dan Barker
I don't understand the hubbub over compiling this thing. My first DRBD was
on Linux from scratch, a distribution where everything is done by hand, so
there was no package manager availaable. I found the install quite simple.
My most recent upgrade consisted of these easy steps, and took about 2
minutes. Selecting the kernel parameters for your environment (step 5) might
take a little more time, but not counting backups, I'd say 10 minutes ought
to do it.

There are some prereqs, but they are probably on your system already: make
gcc libc6 flex linux-headers-`uname -r` libc6-dev libssl-dev.

1. cd /usr/src/
2. wget http://oss.linbit.com/drbd/8.4/drbd-8.4.2.tar.gz
3. tar -xzf drbd-8.4.2.tar.gz
4. cd drbd-8.4.2/
5. ./configure --with-km --prefix /usr --sysconfdir /etc --localstatedir
/var
6. make clean all
7. make install  

I hope seeing it laid out in all its simplicity encourages you to give it a
try. Heck, fire up a virtual machine or two and experiment. That's the fun
part of our jobs anyhow.

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Adam Goryachev
Sent: Tuesday, October 23, 2012 6:51 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] DRBD Versions

Hi,

I've been using DRBD on a couple of systems for a while, and have just used
the version that came with my distro (debian stable (squeeze)) since I
really don't like to maintain compiling and installing from source, and
managing (remembering) to upgrade, recompile, etc each time there is a new
version.

However, more and more, it would seem that my current 8.3.7 (debian package
2:8.3.7-2.1) is probably missing a lot of bug fixes, but on checking, debian
testing only has 8.3.13, and even debian unstable has only 8.3.13.

So the question is, should I just bite the bullet and install DRBD from
source?

I notice from http://www.drbd.org/download/packages/ that DRBD is integrated
into the vanilla kernel 2.6.33 or newer.
If I upgrade my debian stable kernel (2.6.32) to a newer version (either
debian testing or debian-backports) 3.2 based kernel, can I then just
download, compile, and install the latest 8.4.x version of DRBD?

Thank you for your suggestions/comments

Regards,
Adam

--
Adam Goryachev
Website Managers
www.websitemanagers.com.au

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] IO Error Logging

2012-10-07 Thread Dan Barker
My system logs them with timestamps. They just happen to be relative to boot
time. I dmtime say dmtime 12345 and see the real time stamp.

cat  /usr/local/bin/dmtime  EOF
date --date=@$((($(date --date=$(ls -ld  --time-style=+%Y-%m-%d %H:%M
/proc/1|awk '{print $6,$7}') +%s) + $1)))
EOF

You may need to muck with the script to make it match your system's
peculiarities; shown is for Debian.

Dan

-Original Message-
From: Felix Frank [mailto:f...@mpexnet.de] 
Sent: Sunday, October 07, 2012 9:09 AM
To: Andrew Eross
Cc: Dan Barker; drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] IO Error Logging

Hi,

On 10/06/2012 03:38 AM, Andrew Eross wrote:
 Below is what I'm seeing in dmesg.

No timestamps? Bummer.

Does your system log those via syslog too (in Debian, typicalle
/var/log/kern.log)? That log typically has far superior timestamps even.

Cheers,
Felix

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Does oversize disk hurt anything?

2012-10-07 Thread Dan Barker
-Original Message-
From: Florian Haas [mailto:flor...@hastexo.com]
Sent: Sunday, October 07, 2012 5:46 PM
To: Dan Barker
Cc: drbd List
Subject: Re: [DRBD-user] Does oversize disk hurt anything?

On Sun, Oct 7, 2012 at 2:20 PM, Dan Barker dbar...@visioncomm.net wrote:
Well if you had created a partition (/dev/sdc1) rather than use the full 
disk (/dev/sdc), then you could have set up that partition to match the size 
of the disk on your primary.

 Partition. Great idea. If I had thought of that, I'd have bought only one 
 new 500G disk instead of two. Thanks for the hint. 1T disks cost the same as 
 500G these days.

The physical device sizes differing isn't a problem at all; DRBD will just 
select the smaller size of the two.
I know drbd is just using the outside 500G of the oversize disk. It's just that 
the metadata is in near the hub. A partition would have placed it mid-disk but 
I didn't think of that.

Why? Your cluster manager (typically Pacemaker) should take care of that for 
you.

 No cluster manager, no NA. Easy manual failover. This is a lab environment 
 and HA is not really needed. The users of drbd storage are ESXi hosts. To 
 take the primary server off line I:
 DrbdR0: drbdadm primary all (allow dual primaries is on)
 DrbdR0: start iet
 ESXi (all): verify all four paths to both drbd are online

We may have had this discussion before, but:
http://fghaas.wordpress.com/2011/11/29/dual-primary-drbd-iscsi-and-multipath-dont-do-that/

 Thanks for the help.

Pleasure.

Cheers,
Florian

Of course I've been following the dont-do-that threads. I've been down that 
path several times. It works great for a while and then doesn'tg. But that 
was a couple of years ago.

What I am currently doing is different; the exposure is very brief, if at all.

When the second DRBD publishes its iSCSI paths, ESXi discovers them but 
continues to use the original path for all I/O. It's not concurrent multipath. 
Only when the original path dies (when I stop iet on the primary drbd) does 
ESXi switch to active I/O on the other path.

I think your fears are about simultaneous dual-access, not about what I'm 
doing. I don't think I'd recommend anyone else do it this way, it's just the 
way I'm doing it with the hardware laying around.

Thanks for the feedback. Here's some feedback for you: drbd is Great! Thanks 
for making it available. Best wishes for you at Hastexo.

Dan

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Does oversize disk hurt anything?

2012-10-05 Thread Dan Barker
I just lost a disk on my secondary node. I looked EVERYWHERE and can't find
the spare disks I bought for such an occurrence. So, I put in a handy disk,
twice the size.

drbdadm create-md r1
drbdadm attach r1

and off we go.

If memory serves, create-md will build a meta-data at the END of the disk.
Won't that cause a lot of seek to the hub when seeking to about the middle
of the platters would have done the trick, had the metadata been at the same
offset as the primary?

Dan

version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@DrbdR0,
2012-05-28 12:09:30 (Yes, I know. I need to upgrade).

Failed disk: WD 500G
Replaced by: WD 1T
On server: DrbdR0

cat /etc/drbd.d/r1.res
resource r1 {
on DrbdR0 {
volume 0 {
device   /dev/drbd1 minor 1;
disk /dev/sdc;
meta-diskinternal;
}
address  ipv4 10.20.30.46:7790;
}
on DrbdR1 {
volume 0 {
device   /dev/drbd1 minor 1;
disk /dev/sdc;
meta-diskinternal;
}
address  ipv4 10.20.30.47:7790;
}
startup {
become-primary-on DrbdR1;
}
}



___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] IO Error Logging

2012-10-05 Thread Dan Barker
dmesg | grep sr1 should show you all you need to know.

 

Dan (there's that word should againg)

 

From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Andrew Eross
Sent: Friday, October 05, 2012 2:17 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] IO Error Logging

 

Hi guys,

 

I'm trying to debug a SSD drive that's the backing device for my secondary
node.

 

The primary/secondary are sync'd (protocol C) and everything goes fine until
I get to testing fail-over, e.g.on the primary drbdadm secondary drbd-sr1,
and on the secondary drbdadm primary drbd-sr1.

 

When I do this the secondary locks up for about 5 minutes (SSH session
drops) then it starts responding again and I see drbd has now dropped into
diskless mode.

 

I'm thinking there might be IO errors occurring with the underlying disk and
perhaps drbd is automatically detaching it.

 

Right now I'm running badblocks on the backing device and seeing if it can
find any problems.

 

In the meantime I've been trying to figure out how to get more information
about IO errors from drbd.

 

My devices are configured with detach as recommended
(http://www.drbd.org/users-guide/s-configure-io-error-behavior.html),
however, I'm not sure how to find out more information about when this event
occurs.

 

Are there any debugging options I can enable that would help me see IO error
details that caused a detach? 

 

Thanks!

Andrew

 

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Status Mismatch

2012-09-19 Thread Dan Barker
The sync hasn't finished. It's at 100%, but still doing cleanup at
end-of-task. When it completes, you'll see the correct status. Inconsistent
is the VALID status until the sync finishes. When the progress bar goes
away, it's really done. Check the logs if you think it's hung there too
long.

 

Dan

 

From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of J.R. Lillard
Sent: Wednesday, September 19, 2012 9:46 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] Status Mismatch

 

What would cause two nodes to show different statuses?

 

Primary

 

10: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent A r-

ns:154708236 nr:0 dw:155723616 dr:478644485 al:1603514 bm:41448 lo:1
pe:180 ua:0 ap:179 ep:1 wo:f oos:20

[===] sync'ed:100.0% (20/21460)K

finish: 0:00:00 speed: 16 (16) K/sec

 

Secondary

 

10: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate A r-

ns:0 nr:114229240 dw:114229240 dr:0 al:0 bm:26155 lo:0 pe:0 ua:0 ap:0
ep:1 wo:f oos:0

 

-- 
J.R. Lillard

System / Network Admin

Web Programmer

Golden Heritage Foods

120 Santa Fe St.

Hillsboro, KS  67063

 

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] What to do about read errors on the primary?

2012-09-18 Thread Dan Barker
I have read errors on the primary side, which caused the secondary to go
into an inconsistent state.

It's a shame you lost the logs. They would have said much.

When drbd loses a primary disk, it continues to work, read/write, using the
secondary disk. The active node will remain primary, the standby node will
remain secondary, but the disk state will be diskless/uptodate. All I/O is
going over the wire now, reads and writes; not just writes as is the normal
(uptodate/uptodate) case.

You have described a result different than that, so the precipitating events
must be different too.

hth

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Alan Robertson
Sent: Tuesday, September 18, 2012 12:06 PM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] What to do about read errors on the primary?

There was another note mentioning backups...
DRBD is designed to protect against server and disk failures.  Backups
primarily protect against human errors, disasters and so on - and I do
have backups... Snarky comments aren't very helpful and don't have
much place in civil discourse except maybe with your friends.   The fact
that you don't want your system to recover from I/O errors is your choice.
I'm funny that way -  I want my system to do all it can to recover from
problems, and minimize data loss...

In this case, I have a disk failure which I am having trouble getting DRBD
to protect me against.  I'm perfectly willing to accept that I should have
configured things differently - which would be why I came here asking for
help.  In the 10+ years I've been using and recommending DRBD, it's never
come up for me before.

On 09/18/2012 05:16 AM, Lars Ellenberg wrote:

 Alan Robertson al...@unix.sh schrieb:

 I have read errors on the primary side, which caused the secondary to 
 go into an inconsistent state.  This means that the disk which 
 desperately needs backing up, is no longer being backed up (!).

 In an ideal world, it seems to me what one would like for DRBD to do 
 would be:
get the data from the secondary
write it to the primary - which often fixes read errors
continue on syncing everything else to the secondary
 Well, we don't do this yet.
 We detach the faulty disk, and resync when you reattach.

 Platform, kernel version, drbd version, configuration and logs...
  ;-)
I actually figured you'd just tell me what I needed to change - so I didn't
go grab them the first time.  Nevertheless - mea culpa... ;-)

The original occurrence is lost to antiquity, unfortunately so logs could
only be recent - not when it first happened.  I included a good bit of
recent logs from both sides.  I grepped out drbd issues.  Let me know what
else you want.

What _seems_ to have happened, is that the primary continued on, and the
secondary became inconsistent because the two sides were disconnected. 
Attempts to resync the two failed because of the read error on the primary -
making it impossible to switch to the secondary using normal methods.


Linux paul 2.6.38-15-server #66-Ubuntu SMP Tue Aug 14 17:42:23 UTC 2012
x86_64 x86_64 x86_64 GNU/Linux
drbd8-utils
2:8.3.9-1ubuntu1   RAID 1 over tcp/ip for Linux
utilities

$ cat /proc/drbd (on 'paul' - primary for the problematic partition)
version: 8.3.9 (api:88/proto:86-95)
srcversion: 8925C35502BC976C622CF7A
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/Inconsistent C r-
ns:3173100 nr:0 dw:3244204 dr:20104349 al:5775 bm:836 lo:0 pe:0 ua:0
ap:0 ep:1 wo:f oos:512
 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-
ns:1420 nr:5328 dw:6780 dr:6866 al:0 bm:178 lo:0 pe:0 ua:0 ap:0 ep:1
wo:f oos:0


paul:/etc/drbd.d $ cat home.res  (this is the partition with problems)
resource home {
device /dev/drbd0;
meta-disk internal;
on paul {
disk /dev/paul/home;
address 10.1.1.31:7789;
}
on silas {
disk /dev/silas/home;
address 10.1.1.32:7789;
}
}

paul:/etc/drbd.d $ cat etc.res  (this partition is a happy camper) resource
etc {
device /dev/drbd1;
meta-disk internal;
on paul {
disk /dev/paul/etc;
address 10.1.1.31:7790;
}
on silas {
disk /dev/silas/etc;
address 10.1.1.32:7790;
}
}



paul:/etc/drbd.d $ cat global_common.conf global {
usage-count yes;
# minor-count dialog-refresh disable-ip-verification

Re: [DRBD-user] What to do about read errors on the primary?

2012-09-18 Thread Dan Barker
shot myself in the foot somewhere along the line

I'm glad you don't need any help on that subject. I have much experience
shooting my own foot; I'm glad I don't need to share them with youg.

If the primary's disk is the best you've got, and it's worth some file
corruption (drbd abhors any single-bit difference from primary to
secondary), I think the best course (which will probably crash and burn) is
to dd the contents of the primary disk to a new, hopefully identical disk.
On error, dd will probably stop. You can then restart it beyond the bad
spot with seek=. After less than 200 trys, you'll have a copy of the
readable blocks on a disk which will run with no read errors, although there
will be junk in the places that were bad before.

Mount that to drbd, mount the secondary discard-my-data and let them sync
up. Then  fsck and hold on to your shorts.

AFAICT, that's going to be your best (only?) shot.

Not knowing what you did this time makes it difficult to direct you not to
do that again, but I'm going to try, Don't do that again.

A simple suggestion is to do a weekly verify with email to you if anything
is amuck. Of course, even that can fail. No email means no verify error, but
it doesn't mean the CPU didn't overheat and shutdown one of the nodes
(happened to me a couple weeks back. $2 fan).

hth

Dan

-Original Message-
From: Alan Robertson [mailto:al...@unix.sh] 
Sent: Tuesday, September 18, 2012 1:24 PM
To: Dan Barker
Subject: Re: [DRBD-user] What to do about read errors on the primary?

On 09/18/2012 10:24 AM, Dan Barker wrote:
 I have read errors on the primary side, which caused the secondary to 
 go into an inconsistent state.

 It's a shame you lost the logs. They would have said much.

 When drbd loses a primary disk, it continues to work, read/write, 
 using the secondary disk. The active node will remain primary, the 
 standby node will remain secondary, but the disk state will be 
 diskless/uptodate. All I/O is going over the wire now, reads and 
 writes; not just writes as is the normal
 (uptodate/uptodate) case.

 You have described a result different than that, so the precipitating 
 events must be different too.

Thanks for the description of how it's supposed to work in this case.  I
didn't really know.

I may have shot myself in the foot somewhere along the line too...  I
certainly wouldn't count that out. :-D

The reason why the logs were lost is that I didn't notice for a long time...
It could have been many months.  This is my home system.  It's actually been
many years since I had a disk failure...

What I noticed was that some failover tests I was performing didn't work
- it insisted on leaving things on the (now-broken) primary side.  I then
noticed the DRBD state wasn't in sync (and even that was a month or so ago -
life has been busy).  I tried to bring them into sync using a variety of
techniques that didn't work. _Then_ I noticed the I/O errors.

The I/O errors are near the end of the disk.  I wonder if some of the I/O
errors were in the bitmap?

But after screwing around, and probably shooting myself in the foot, I'd
like for the two sides to continue to try and stay in sync as much as they
can.  I don't want the synchronization to stop just because there might be
an I/O error on one block.  Or at least, I _think_ that's what I want.  [In
my case, of course, it was a lot more than one block - but less than 200].

In my case, the only absolutely up-to-date copy I have is in this failing
drive.  Not what I wanted...  I may have caused this by my flailing around
trying to make failover work.

-- 
Alan Robertson al...@unix.sh - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim
from you at all times your undisguised opinions. - William Wilberforce

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] slow write performance

2012-09-12 Thread Dan Barker
a) use oflag=direct or dd will just test caching.
b) You show a 183MB/s rate, which is pretty good. However, the target
appears not to be the drbd volume, you don't say if the resource is
connected or waiting for connection, and you don't describe the underlying
hardware.
c) How is /mnt/mysql/syncing_drbd related to opt/drbd-test.loop or
/dev/drdb1?

Dan (the top poster)

-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of rahulcs
Sent: Tuesday, September 11, 2012 4:43 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] slow write performance


Hi,
I am using DRBD for syncing files between two machines on a LAN. I have
created a device on a file using dd if=/dev/zero of=/opt/drbd-test.loop
bs=1M count=200
losetup /dev/loop1 /opt/drbd-test.loop.

My drbd resource settings are as follows:
root@obelix101:/tmp# drbdsetup 1 show
disk {
size0s _is_default; # bytes
on-io-error pass_on _is_default;
fencing dont-care _is_default;
max-bio-bvecs   0 _is_default;
}
net {
timeout 60 _is_default; # 1/10 seconds
max-epoch-size  8000;
max-buffers 8000;
unplug-watermark16;
connect-int 10 _is_default; # seconds
ping-int10 _is_default; # seconds
sndbuf-size 0 _is_default; # bytes
rcvbuf-size 0 _is_default; # bytes
ko-count0 _is_default;
after-sb-0pri   disconnect _is_default;
after-sb-1pri   disconnect _is_default;
after-sb-2pri   disconnect _is_default;
rr-conflict disconnect _is_default;
ping-timeout5 _is_default; # 1/10 seconds
on-congestion   block _is_default;
congestion-fill 0s _is_default; # byte
congestion-extents  127 _is_default;
}
syncer {
rate1024000k; # bytes/second
after   -1 _is_default;
al-extents  3389;
on-no-data-accessible   io-error _is_default;
c-plan-ahead0 _is_default; # 1/10 seconds
c-delay-target  10 _is_default; # 1/10 seconds
c-fill-target   0s _is_default; # bytes
c-max-rate  102400k _is_default; # bytes/second
c-min-rate  4096k _is_default; # bytes/second
}
protocol C;
_this_host {
device  minor 1;
disk/dev/loop1;
meta-disk   internal;
address ipv4 192.168.245.101:7789;
}
_remote_host {
address ipv4 192.168.245.102:7789;
}

I am getting really bad write performance:

root@obelix101:/tmp# time dd if=/dev/zero of=/mnt/mysql/syncing_drbd bs=1M
count=12
12+0 records in
12+0 records out
12582912 bytes (13 MB) copied, 0.0686793 s, 183 MB/s

real3m49.773s
user0m0.000s
sys 0m0.168s

What am i doing wrong ?

Thanking you,
Rahul 
-- 
View this message in context:
http://old.nabble.com/slow-write-performance-tp34419641p34419641.html
Sent from the DRBD - User mailing list archive at Nabble.com.

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD volumes have different files, but DRBD reports them being in sync

2012-09-02 Thread Dan Barker
See below 

-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Bala Ramakrishnan
Sent: Saturday, September 01, 2012 4:24 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] DRBD volumes have different files, but DRBD reports
them being in sync

I have a DRBD installation on two machines running Centos 6.3 and DRBD 8.4.1

I have a single resource 'agalaxy' being synced across these two machines.
This resource has two volumes:
Volume 0: /dev/drbd0 mounted on /a10data And Volume 1: /dev/drbd1 mounted on
/a10.

Volume 0 is running Postgres. I did a lot of other activities with DRBD
shutdown. 

 Activities with DRBD shutdown? Did any of these activities affect
lv_a10data or lv_a10? If so, the DRBD metadata won't know about it and can't
sync it.

However after a while, I found that the contents in the directory /a10data
on one of the machines was different (some intermediate level directories
were missing), yet DRBD (cat /proc/drbd) reported that the file systems were
in sync.

 You can't view the secondary node - it can't be mounted. So what steps
are you not telling us?

Ultimately, I had to re-initialize and resync the volume by invalidating it:
Drbdadm invalidate agalaxy/0

 Careful that you do this on the correct node. One of the nodes should
have the correct data and the other should have obsolete data. DRBD can
sync in either direction. Until/unless you identify how you got the volumes
to diverge, you'll probably repeat the problem. If the DRBD nodes get
disconnected AND you modify the secondary to primary and mount/write the
disk AND you then reconnect the nodes, DRBD will recognize the situation
(called Split-brain). You didn't mention split brain, so you must have
written to one of the LVs while DRBD was down.

Has anyone run into this kind of issue?

 In summary, you can't say DRBD reports them being in sync when DRBD
is down, and you can't access the secondary volumes with DRBD is up -
Something is missing from your question.

 Dan

===

For example, on balar-lnx3, the contents of /a10data/db/data/system was:
[root@balar-lnx3 system]# ls
12531  12664  12670  12675  12681  12687  12779
12531_fsm  12666  12671  12677  12682  12688  pg_control
12531_vm   12667  12672  12678  12683  12773
pg_filenode.map
12533  12668  12672_fsm  12679  12683_fsm  12775
pg_internal.init
12534  12668_fsm  12672_vm   12679_fsm  12683_vm   12777  pgstat.stat
12662  12668_vm   12674  12679_vm   12685  12778

The contents of /a10/data/db/data/system on balar-lnx was:
[root@balar-lnx system]# ls
base pg_ident.conf  pg_stat_tmp  PG_VERSION
global   pg_multixact   pg_subtrans  pg_xlog
pg_clog  pg_notify  pg_tblspcpostgresql.conf
pg_hba.conf  pg_serial  pg_twophase  postmaster.opts
[root@balar-lnx system]#

The contents of /a10data/db/data/system on balar-lnx3 was actually the
contents of /a10data/db/data/system/global on balar-lnx. Yet, DRBD was
reporting the status:
[root@balar-lnx3 ~]# cat /proc/drbd
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil@Build64R6,
2012-04-17 11:28:08
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-
ns:0 nr:173009724 dw:173009724 dr:0 al:0 bm:10560 lo:0 pe:0 ua:0 ap:0
ep:1 wo:f oos:0
 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[root@balar-lnx3 ~]#

#=
Here is my drbd conf:

1. Global:

global {
usage-count yes;
# minor-count dialog-refresh disable-ip-verification }

common {
handlers {
pri-on-incon-degr /usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b  /proc/sysrq-trigger ;
reboot -f;
pri-lost-after-sb /usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b  /proc/sysrq-trigger ;
reboot -f;
local-io-error /usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o  /proc/sysrq-trigger ;
halt -f;
}

startup {
wfc-timeout 0;
degr-wfc-timeout 120;
}

options {
# cpu-mask on-no-data-accessible
}

disk {
on-io-error detach;
}

net {
protocol C;
}
}

2. Agalaxy.res:
resource agalaxy {
disk {
resync-rate 100M;
fencing resource-only;
}
handlers {
fence-peer /usr/lib/drbd/crm-fence-peer.sh;
after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;
}
on balar-lnx {
address 10.0.1.1:7788;
volume 0 {
device  /dev/drbd0;
disk/dev/vg_balarlnx3/lv_a10data;
meta-disk   internal;
}
volume 1 {
device  /dev/drbd1

Re: [DRBD-user] using truck based replication, but trying to save some gas

2012-08-21 Thread Dan Frincu
Hi,

On Mon, Aug 6, 2012 at 1:28 AM, Two Spirit twospirit6...@gmail.com wrote:
 http://www.drbd.org/users-guide/s-using-truck-based-replication.html

 I've already got remote sites (small sites with slow bw) that can hold data,
 and I'm regularly getting new data that I'd like to replicate remotely to
 multiple sites. I'm evaluating using drbd for that, but I wanted some
 torrent type technology to replicate to multiple sites. I currently use
 rsync, but revering the rsync in a recovery scenario is not desireable. I
 have used the shipping method to ship and seed to a remote backup site, but
 I'd like to avoid mainly because from the recovery point of view, when a
 major disaster happens, I want to start recoverying and utilize more
 bandwidth than what one remote site can upload, and maybe I can put myself
 in a situation where recovery would be faster than the truck method for
 recovery especially since the small sites use DSL and upload speeds are way
 smaller than their DL speeds.

For low bandwidth connections, there's DRBD Proxy, you'd have to talk
to the sales guys over at Linbit for more details. Don't know how it
covers multi-site scenarios, usually DRBD works only between 2
locations. The exception is the the DRBD stacked scenarios, when data
can be in up to 4 different locations. However in most cases, writes
are done only in one locations. For situations where you need to write
at the same time in different locations, you'll need at least a
cluster aware FS for that.


 anyone know of any torrent technology for remote [secure] replications that
 I might be able to use? I'm evaluating the tahoe-lafs currently for that,
 but I'd like something different.

Sorry, no clue.

HTH,
Dan



 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user




-- 
Dan Frincu
CCNA, RHCE
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] ds:Uptodate not uptodate

2012-08-16 Thread Dan Barker
Run a verify. It will show issues, I imagine. The version you like should
be primary. drbdadm disconnect and then drbdadm connect (from either node)
and they'll sync up.

 

I imagine some changes were made to one of the devices while not under drbd
control, so no stale bits were set in the bitmap. Verify will find them
all and update the bitmap.

 

Also, be CERTAIN you are not mounting the underlying block devices not via
drbd anywhere, ever.

 

Oops, I just reread your message, The systems have been synced twice. How
are you performing the sync? The verify/disconnect/reconnect is the
fastest/best way to resync nearly identical resources. If you have truly
synced the devices and not written to them outside drbd control, the only
way they can diverge is hardware failure - rarely undetected.

 

Good luck. Let us know how it goes!

 

Dan

 

From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Two Spirit
Sent: Saturday, August 04, 2012 11:14 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] ds:Uptodate not uptodate

 

Hi, newbie here,

I'm trying to figure out what is going on. I've got 2 servers,
primary/secondary, disk states are both uptodate,

but there is different information on both on both servers.

If I make server1 the primary and mount it, I find file1 in the file system.

if I make server1 the secondary and server2 the primary, I find file2 in the
filesystem, but not file1.

The systems have been synced twice. Not sure how to proceed to figure this
out.

 

help.

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Switching from internal to external meta-disk

2012-07-11 Thread Dan Barker
And further, the FIRST step in any maintenance should be is cat
/proc/drbd. You would have seen which node had the current data.

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Felix Frank
Sent: Wednesday, July 11, 2012 3:33 AM
To: Jean-Baptiste
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Switching from internal to external meta-disk

Hi,

On 07/10/2012 06:59 PM, Jean-Baptiste wrote:
  9. Doing drbdadm -- --overwrite-data-of-peer primary RES (from the
 primary node)
 10. Let synchronize process ending
 11. Done
 
 At this step everything is fine, my SGBD was restarted without any 
 warning, nothing seems to go wrong.
 But ... I was lost 11 days of data in my SGBD.

we've seen similar effects on several occasions on this list. So far, it has
always (iirc) been a case of diskless primary.

Have you retained logs from 11 days ago? I'd expect you to find a note
around that time stating that your primary detached its backing device.

*If* this assumption is correct, here's what's happened then: You primary
happily kept writing data, but it never reached its local HDD.
Instead, all changes were written to the secondary's disk only. When you did
your changes and overwrote the data of the secondary, you killed your data.

Bottom line is, it's crucial to be mindful of the health state of your
resources. Ideally, monitoring should report whenever your disks are not
UpToDate/UpToDate, among other possible problems.

HTH,
Felix
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Parse error an option keyword expected but got fence peer

2012-06-22 Thread Dan Barker
That is an ancient DRBD. I've been using drbd for years (at least 3), and I
don't remember anything before 8.3.7. I'm running 8.4.1 now. Why are you on
such an old version? Maybe the command syntax was different and the messages
are simply correct. I don't know about 8.2, but I do know the documentation
website is segregated into 8.4.x and 8.3.x sections due to command syntax
changes. I may be completely off base, but if command parsing is throwing
errors, maybe your configuration is invalid. You didn't post the entire
config file (Although I don't know what was or was not valid on that old
release). 

I don't think this will help you much, but there's always hope! (Hope that
you'll try a more recent drbd - then I CAN help you out).

Of course, some other users may be familiar with that version and help you
without you having to  install a current version. I believe 8.3.13 is the
recommended level.

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Keith Christian
Sent: Friday, June 22, 2012 7:47 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] Parse error an option keyword expected but got fence
peer

I've searched for a solution to this error, lots of hits for Parse error
but couldn't find anything specific for fence-peer.

I have checked the drbd.conf file for obvious errors like unbalanced braces,
and missing semicolons at the end of line.  Nothing found.

Using these RPM's:

drbd82-8.2.6-1.el5.centos
kmod-drbd82-8.2.6-2



This is on a 64 bit system, so I fixed line 31 which needed lib64 to find
the file:

ls -l /usr/lib64/heartbeat/drbd-peer-outdater
-rwxr-xr-x 1 root root 15984 Feb  6  2008
/usr/lib64/heartbeat/drbd-peer-outdater



When running any DRBD command I see this error:

drbdadm create-md drbd-resource-0
/etc/drbd.conf:31: Parse error: 'an option keyword' expected,
but got 'fence-peer'




I commented out line 31, tried to start DRBD again, and saw the error on
line 56, removed the comment from line 31, and the error returns to line 31.

service drbd start
/etc/drbd.conf:56: Parse error: 'an option keyword' expected,
but got 'outdated-wfc-timeout'
Starting DRBD resources:/etc/drbd.conf:56: Parse error:
'an option keyword' expected,
but got 'outdated-wfc-timeout'

53 # Wait for connection timeout if the peer node is
already outdated.
54 # (Do not set this to 0, since that means unlimited)
55 #
*** 56 outdated-wfc-timeout 2;  # 2 seconds.
57# In case there was a split brain situation the
devices will
58 # drop their network configuration instead of connecting.
Since



Below are the first 35 lines of the file, which enclose the line throwing
the error:

 1 global { usage-count no; }
 2
 3 resource drbd-resource-0 {
 4   protocol C;
 5
 6 handlers {
 7   # what should be done in case the node is primary, degraded
 8 # (=no connection) and has inconsistent data.
 9 pri-on-incon-degr /usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b  /proc/sysrq-trigger ;
reboot -f;
10
11 # The node is currently primary, but lost the after split
brain
12 # auto recovery procedure. As as consequence it should go
away.
13 pri-lost-after-sb /usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b  /proc/sysrq-trigger ;
reboot -f;
14
15 # In case you have set the on-io-error option to
call-local-io-error,
16 # this script will get executed in case of a local IO error.
It is
17 # expected that this script will case a immediate failover in
the
18 # cluster.
19 local-io-error /usr/lib/drbd/notify-local-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o  /proc/sysrq-trigger ;
halt -f;
20
21
22 # Commands to run in case we need to downgrade the peer's
disk
23 # state to Outdated. Should be implemented by the superior
24 # communication possibilities of our cluster manager.
25 # The provided script uses ssh, and is for
demonstration/development
26 # purposis.
27 # fence-peer /usr/lib/drbd/outdate-peer.sh on amd
192.168.22.11 192.168.23.11 on alf 192.168.22.12 192.168.23.12;
28 #
29 # Update: Now there is a solution that relies on heartbeat's
30 # communication layers. You should really use this.
*** 31 fence-peer /usr/lib64/heartbeat/drbd-peer-outdater -t
5;
32 # For Pacemaker you might use:
33 # fence-peer /usr/lib/drbd/crm-fence-peer.sh;
34
35 }



I'd appreciate any insight or help.


== Keith

Re: [DRBD-user] Shrink an LVM partition and create a new one for the /drbd mount ?

2012-06-16 Thread Dan Barker
Sounds like your root file system is in LogVol01. Makes working on it nearly
impossible. A good live-CD with LV tools is rescatux
(www.supergrubdisk.org/rescatux). Good luck! I just used it to reorganize
some root LVs on RHEL 5.

Dan


-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Pascal BERTON
Sent: Saturday, June 16, 2012 3:32 AM
To: 'Keith Christian'; drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Shrink an LVM partition and create a new one for
the /drbd mount ?

Hi Keith!

It does. Don't forget to shrink your filesystem(s) first. You'll find a
whole lot of howtos on the net to achieve that depending on the filesystem
you actually use.

Pascal.

-Message d'origine-
De : drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] De la part de Keith Christian
Envoyé : samedi 16 juin 2012 00:13 À : drbd-user@lists.linbit.com Objet :
[DRBD-user] Shrink an LVM partition and create a new one for the /drbd mount
?

I have a pre-configured server using LVM, with one large logical volume
using all disk space beyond the /boot partition, and a separate /drbd
partition is needed.

Normally, I remove LVM and create a /drbd partition after formatting the
RAID array.

That isn't an easy option here, as the machine is pre-configured with the
OS, etc.

Am I correct that the analog to a physical /drbd partition under LVM is the
creation of a logical volume in a volume group?



$ lvdisplay
  --- Logical volume ---
  LV Name/dev/VolGroup00/LogVol01
  VG NameVolGroup00


The lvcreate command failed, not enough free extents:


$ lvcreate -v --size 1g --name LogVol02 VolGroup00
Setting logging type to disk
Finding volume group VolGroup00
  Insufficient free extents (0) in volume group VolGroup00: 32 required


I'll have to shrink LogVol01 to make room for a LogVol02 on which /drbd will
be mounted.

Does this sound like the correct way of creating a partition for /drbd in
this situation?


Thanks!


Keith
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Corosync Configuration

2012-06-11 Thread Dan Barker
Off OP's topic, but a correction.

-Original Message-
From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Pascal BERTON
Sent: Thursday, June 07, 2012 1:29 PM
To: 'Jake Smith'; drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Corosync Configuration

I also used this manual to startup, I agree with Jake, that's a typo. Apart 
from that, William, your bindnetaddr parameter should not end with a 0, 
it's supposed to be the IP address your local host/node will use to monitor its 
peers. Instead, replace it by the IP address your server has on network 
192.168.1.0.

snip

Although it is extremely likely that 'should not end with a 0' is a correct 
statement on this network (it does begin with a 192 and we assumed it is a 
default 24 bit subnet), it is not a requirement. The OP didn't post his subnet 
specification.

Any network with a subnet mask of 23 bits or less can have a final octet of 
Zero be a host. In a class A subnet, there is one network (ends with a .0.0.0), 
255 hosts that end with two zeros (.0.0) and 65,280 hosts that end with a 
single zero .0).

I normally wouldn't make this minor a correction, but Pascal then said The 
cool thing (to me) is that I learn something more everyday on this ML...

Dan

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] [patch -resend] drbd: fix resync_dump_detail() output

2012-06-08 Thread Dan Carpenter
The tests here aren't correct.  It should be doing a shift before doing
the bitwise AND.  (bme-flags  BME_NO_WRITES) is always false and
(bme-flags  BME_LOCKED) checks for BME_NO_WRITES instead of checking
for locked.

Signed-off-by: Dan Carpenter dan.carpen...@oracle.com
---
I sent this to the drbd-user list in March, but never recieved a
response.

diff --git a/drivers/block/drbd/drbd_proc.c b/drivers/block/drbd/drbd_proc.c
index 2959cdf..ffe1ee4 100644
--- a/drivers/block/drbd/drbd_proc.c
+++ b/drivers/block/drbd/drbd_proc.c
@@ -187,8 +187,10 @@ static void resync_dump_detail(struct seq_file *seq, 
struct lc_element *e)
struct bm_extent *bme = lc_entry(e, struct bm_extent, lce);
 
seq_printf(seq, %5d %s %s\n, bme-rs_left,
-  bme-flags  BME_NO_WRITES ? NO_WRITES : -,
-  bme-flags  BME_LOCKED ? LOCKED : --
+  test_bit(BME_NO_WRITES, bme-flags) ?
+   NO_WRITES : -,
+  test_bit(BME_LOCKED, bme-flags) ?
+   LOCKED : --
   );
 }
 
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Syncer's after dependency for online verifying

2012-06-04 Thread Dan Barker
 Is it possible to have DRBD verify B only when A is done verifying?

I use crontab.

On Monday at 2:11 AM, verify r0 
On Tuesday at 2:11 AM, verify r1
On Wednesday at 2:11 AM, verify r2 
On Thursday at 2:11 AM, verify r3

  --or--

11  2  *   *   1 /sbin/drbdadm verify r0
11  2  *   *   2 /sbin/drbdadm verify r1
11  2  *   *   3 /sbin/drbdadm verify r2
11  2  *   *   4 /sbin/drbdadm verfiy r3

Dan


-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Lars Ellenberg
Sent: Friday, June 01, 2012 11:50 AM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Syncer's after dependency for online verifying

On Fri, Jun 01, 2012 at 12:03:20PM +0200, Lionel Sausin wrote:
 Dear DRBD community,
 
 We're using DRBD 8.3.11 with 2 resources called A and B.
 B is configured to sync after A, but when I run drbdadm verify all
 both resources start verifying immediately.

That is intentional.

In error scenarios, you do not have much control over when a network
fails/recovers, or a node reboots.
So there we have dependencies to not overload the system with too much
parallel resync activity.

For verify, that is entirely triggered from userspace, you have full
controll.

 Is it possible to have DRBD verify B only when A is done verifying?

Sure.
Just don't start the verify on B while the verify on A is still running.

 ;-)

like,
 for minor in 0 1 2 ; do
drbdsetup $minor verify ; drbdsetup $minor wait-sync  done

Or similar.

 If not, has it been/will it be possible in versions after 8.3.11?

I don't see any kernel support for verify dependencies coming soon, as you
can easily serialize verify operations in userland.


Besides,
now that we have the dynamically adaptive resync rate controller, and it is
used for verify as well, for most cases I'd recommend to enable it globally,
and get rid of the resync dependencies.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] startup hang after yes

2012-05-29 Thread Dan Barker
Machines are now in semi-production (manual start/stop for obvious reasons).

The issue still occurs. If I remove the replication cable and boot the
secondary machine (with 4, up to date resources), the boot process hangs
after I reply yes to the prompt. Reinserting the cable does allow the
startup scripts to continue, with the error message waitpid: Interrupted
system call, but simply replying yes is supposed to do so, with no error.

I don't recall seeing this problem before, and I've been through about 4
drbd release levels.

Dan (the top poster)

-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Dan Barker
Sent: Saturday, May 26, 2012 1:23 PM
To: drbd List
Subject: [DRBD-user] startup hang after yes

I'm building a new drbd machine. I put 8.4.0 on a Debian 603 and all looks
fine except ...

Since I'm testing, I don't have another node. I did create-md and then
primary --force. At boot time, there is no peer, so I get the count-up to
yes. When I enter yes, nothing happens.

If I ssh in and stop/start drbd, all is normal and my initialization scripts
finally run (the ones after drbd).

What can I do to stop the hang?

Other possibly mitigating factors:
 There is no Ethernet cable connected to the NIC for DRBD synchronization.
 There is only one drbd resource defined, drbd3 (no 0, 1 or 2).

I chose 8.4.0 to match the peer in the environment. I thought about 8.4.1 or
8.3.13, but I'll just update everybody to 9 soon.

Dan

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Problem between two nodes but only on 1 ressource

2012-05-29 Thread Dan Barker
You want to preserve the data on node1, not on node2. You may have been
reading the doc from the point of view of the wrong side.

On node1 you want to discard the peer's data.
  -- or --
On node2 you want to discard my data.

The syntax is different from 8.3 to 8.4 but I'm sure you have that handy.

Please tell me you are not accessing /www from node2 (except in an
emergency). If you are accessing the resource from both nodes at once, that
is a whole different matter and will cause split-brain all the time (unless
you are using a cluster-aware file system, and it sounds like you are not).

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of fmorcamp
Sent: Tuesday, May 29, 2012 9:01 AM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Problem between two nodes but only on 1 ressource


Hi Florian,

Thanks for answear. I already see this commands but I wouldn't do it cause
I'm am in production env.
So my node1 is ok but the ressource www is not duplicate on the node2, so my
node2 didn't have all latest files of the node1.
And if I understand well, the command will put my node2 as primary. But I do
that, it will remove and modify all files which have been removed or modify
on my node1 at the next sync no?

Fabien


Florian Haas-5 wrote:
 
 Somebody have a way for me?!
 
 http://www.drbd.org/users-guide-8.3/s-resolve-split-brain.html
 
 Cheers,
 Florian
 
 --
 Need help with High Availability?
 http://www.hastexo.com/now
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 
 

--
View this message in context:
http://old.nabble.com/Problem-between-two-nodes-but-only-on-1-ressource-tp33
771275p33924925.html
Sent from the DRBD - User mailing list archive at Nabble.com.

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] startup hang after yes

2012-05-26 Thread Dan Barker
I'm building a new drbd machine. I put 8.4.0 on a Debian 603 and all looks
fine except ...

Since I'm testing, I don't have another node. I did create-md and then
primary --force. At boot time, there is no peer, so I get the count-up to
yes. When I enter yes, nothing happens.

If I ssh in and stop/start drbd, all is normal and my initialization scripts
finally run (the ones after drbd).

What can I do to stop the hang?

Other possibly mitigating factors:
 There is no Ethernet cable connected to the NIC for DRBD synchronization.
 There is only one drbd resource defined, drbd3 (no 0, 1 or 2).

I chose 8.4.0 to match the peer in the environment. I thought about 8.4.1 or
8.3.13, but I'll just update everybody to 9 soon.

Dan

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] I need Reset DRBD Service

2012-05-18 Thread Dan Barker
On the primary, execute the command drbdadm connect all. That will either
work, or put meaningful messages into the log. If they connect, you are
fine. However, I suspect they will not connect. Look in the log (dmesg) for
drbd messages. They will tell the reason for the connection not being
established.

 

Dan

 

From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Cristian Caceres
Sent: Thursday, May 17, 2012 5:15 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] I need Reset DRBD Service

 

Hi all, I have little experience in drbd, in fact I received as a legacy a
system with this implementation, my problem is that one of the nodes, the
secondary, we had to restart, but now I see they are not connected according
to, I have sought some solution without success, please if someone can help
me decipher this I would appreciate.


the status of each server is the following:

Primary Server:

drbd driver loaded OK; device status:
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
r...@tschawytscha.multiexportfoods.com, 2008-12-23 13:00:05
m:res cs st ds p mounted fstype
0:??not-found?? StandAlone Primary/Unknown UpToDate/DUnknown -

Secondary Server:

drbd driver loaded OK; device status:
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
r...@tschawytscha.multiexportfoods.com, 2008-12-23 13:00:05
m:res cs st ds p mounted fstype
0:??not-found?? WFConnection Secondary/Unknown UpToDate/DUnknown B

 

Thanks..

 

rca

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd wrong lower device doubt

2012-05-18 Thread Dan Barker
The configurations SHOULD match to simplify maintenance, but server A will 
completely ignore server B settings, and server B will completely ignore server 
A settings. Your config is being read by the servers as:

 

Host A: 

resource r0 {

on ME {

device /dev/drbd0;

disk /dev/vg01/share;

address 2.2.2.150:7788;

meta-disk internal;

}

}

and Host B is like this:

resource r0 {

on ME {

device /dev/drbd0;

disk /dev/vg02/share;

address 2.2.2.151:7788;

meta-disk internal;

}

}

 

Which is (I believe) what you wanted.

 

Dan

 

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of ??
Sent: Friday, May 18, 2012 3:20 AM
To: drbd-user
Subject: [DRBD-user] drbd wrong lower device doubt

 

 

Hi All, I have some doubts about drbd. I have configure two servers as Host A 
and Host B. Host A drbd configuration is like this: 

resource r0 {

on A {

device /dev/drbd0;

disk /dev/vg01/share;

address 2.2.2.150:7788;

meta-disk internal;

}

on B {

device /dev/drbd0;

disk /dev/vg01/share;

address 2.2.2.151:7788;

meta-disk internal;

}

}

and Host B is like this:

resource r0 {

on A {

device /dev/drbd0;

disk /dev/vg02/share;

address 2.2.2.150:7788;

meta-disk internal;

}

on B {

device /dev/drbd0;

disk /dev/vg02/share;

address 2.2.2.151:7788;

meta-disk internal;

}

}

You can notice that Host A and Host B configuration file is not same. Actually 
Host A lower device is /dev/vg01/share and Host B lower device is  
/dev/vg02/share. The specified destination lower device is wrong in each 
server. Network setting is right . I set Host A disk state to UpToDate and Host 
B disk state inconsistent. I find that Host A is syncing to Host B.  Why it can 
work regularly when I configure wrong lower device.

 

 

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Primary/Unknown UpToDate/Outdated

2012-05-02 Thread Dan Barker
ON the Primary, do a drbdadm connect resource or drbdadm connect all.

 

Dan

 

From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Norman
Sent: Monday, April 30, 2012 1:45 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] Primary/Unknown UpToDate/Outdated

 

Hi All,

I'm new to the list. I don't deal with drbdadm all the time because drbd
just runs (usually).  I've got a weird state with my drbd (2 nodes).
Everything was fine till, I shutdown my 'secondary' for a few minutes just
to re-seat a drive.

Now, the Primary says...

 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/Outdated   r
ns:7478380 nr:12656820 dw:22139484 dr:17976297 al:52974 bm:12391 lo:4
pe:0 ua:0 ap:3 ep:1 wo:b oos:795380

Secondary says...

1: cs:WFConnection ro:Secondary/Unknown ds:Outdated/DUnknown C r
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Nothing really changed, both servers can ping each other via the drbd
channel. How do I get the servers connected again and the 'secondary' to be
'UpToDate' again. Is it easily re-synced.  Help Please.

Thanks,
Norman











___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] multipathd ignores my drbd0

2012-04-10 Thread Dan Barker
I've tried everything I can think of to get mapper to pick up the drbd0 device, 
and all failed. I put in a symlink (ln -s /dev/drbd0 /dev/mapper/DRBD0) but 
Oracle VM doesn't see it. I'm guessing it's requiring multipath to access the 
resource.

I'll try some Logical Volumes and see if I can make Oracle VM see those. I know 
I can build a PV/VG/LV from a drbd resource.

Running an iSCSI target and initiator on localhost on this machine is getting a 
bit too weird, even for Oracleg.

Thanks for the tips!

Dan

-Original Message-
From: Kaloyan Kovachev [mailto:kkovac...@varna.net] 
Sent: Tuesday, April 10, 2012 9:53 AM
To: Dan Barker
Cc: drbd List
Subject: Re: [DRBD-user] multipathd ignores my drbd0

I think you can't use DRBD device directly with multipath, but if you export it 
via iSCSI and then import it back it is possible.

Another option is LVM over DRBD - device in /dev/mapper Another one is just a 
symlink in /dev/mapper from udev rule

On Tue, 10 Apr 2012 09:41:13 -0400, Dan Barker dbar...@visioncomm.net
wrote:
 How do I get multipathd to notice my drbd block devices?
 
 RHEL5 (Oracle VM 3.0.3.127, actually), 2.6.32.21-45xen.
 Drbd 8.4.1:
 multipath-tools says v0.4.9. I can't seem to find the multipath version.
 
 resource r0 {
 on OVMPam {
 volume 0 {
 device   /dev/drbd0 minor 0;
 disk /dev/sdb;
 meta-diskinternal;
 }
 address  ipv4 172.30.0.167:7789;
 }
 on DRPam {
 volume 0 {
 device   /dev/drbd0 minor 0;
 disk /dev/sdb;
 meta-diskinternal;
 }
 address  ipv4 172.30.0.170:7789;
 }
 startup {
 become-primary-on OVMPam;
 }
 }
 
 As-distributed global.
 
 All multipath commands reply Apr 09 08:10:50 | DM multipath kernel
driver
 not loaded, as if no devices were detected at boot time.
 
 I have drbd start before multipathd.
 
 Relax, I have no plans to multipath to this device, Oracle VM looks 
 only
in
 /dev/mapper for repository candidates.
 
 New 10 April: Ping! Anyone have any ideas? Even a different forum in
which
 to ask?
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Unable to perform initial sync

2012-04-10 Thread Dan Barker
I do not understand what you did or are trying to do. In Sync has no
direction. If you are in sync from Primary to Secondary, you are in  sync
period. There is no reason to think about a direction.

To do a recovery, when one of the resources is to be used as the sync source
due to having more current data, there are commands to do that,
discard-my-data, overwrite-data-of-peer, etc. They have no inherent
direction, but have complimentary meanings depending on which node the
command is run.

Please explain more what you are trying to accomplish, which node is
primary, secondary, which device is considered current and the commands
you are issuing and on which host.

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Vyacheslav
Karpukhin
Sent: Tuesday, April 10, 2012 12:23 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] Unable to perform initial sync

Hi. I just installed drbd and now trying to perform initial sync. It works
fine in one direction, but if I'm trying to perform it in the opposite
direction, I get this:

 Apr 10 11:13:54 web kernel: block drbd0: Becoming sync source due to disk
states.
 Apr 10 11:13:54 web kernel: block drbd0: peer( Unknown - Secondary )
conn( WFReportParams - WFBitMapS ) 
 Apr 10 11:13:54 web kernel: block drbd0: send bitmap stats
[Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0%
 Apr 10 11:13:54 web kernel: block drbd0: receive bitmap stats
[Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0%
 Apr 10 11:13:54 web kernel: block drbd0: helper command: /sbin/drbdadm
before-resync-source minor-0
 Apr 10 11:13:54 web kernel: block drbd0: helper command: /sbin/drbdadm
before-resync-source minor-0 exit code 0 (0x0)
 Apr 10 11:13:54 web kernel: block drbd0: conn( WFBitMapS - SyncSource ) 
 Apr 10 11:13:54 web kernel: block drbd0: Began resync as SyncSource (will
sync 15519040 KB [3879760 bits set]).
 Apr 10 11:13:54 web kernel: block drbd0: updated sync UUID
AF78EBCA7F218B01:C237DF3A275A375B:C236DF3A275A375B:C235DF3A275A375B
 275A375B:C235DF3A275A375B
 Apr 10 11:13:54 web kernel: block drbd0:
/dev/shm/drbd-8.4.1/drbd/drbd_receiver.c:2541: sector: 0s, size: 4194304
 Apr 10 11:13:54 web kernel: d-con r0: error receiving RSDataRequest, e:
-22 l: 0!
 Apr 10 11:13:54 web kernel: d-con r0: peer( Secondary - Unknown ) conn(
SyncSource - ProtocolError ) 
 Apr 10 11:13:54 web kernel: d-con r0: asender terminated
 Apr 10 11:13:54 web kernel: d-con r0: Terminating asender thread

What's wrong?

Thank you.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] multipathd ignores my drbd0

2012-04-10 Thread Dan Barker
Kaloyan: I can't get Oracle VM to look at the Logical Volume either (building a 
lv on drbd was dirt-simple). I think I'm hosed. I'll return to the Oracle forum 
and see what they say about LVM, but unless I can get the drbd resource to be 
presented by multipath, I don't think Oracle VM will use it.

Thanks for the help, none the less.

Dan

-Original Message-
From: Kaloyan Kovachev [mailto:kkovac...@varna.net] 
Sent: Tuesday, April 10, 2012 9:53 AM
To: Dan Barker
Cc: drbd List
Subject: Re: [DRBD-user] multipathd ignores my drbd0

I think you can't use DRBD device directly with multipath, but if you export it 
via iSCSI and then import it back it is possible.

Another option is LVM over DRBD - device in /dev/mapper Another one is just a 
symlink in /dev/mapper from udev rule

On Tue, 10 Apr 2012 09:41:13 -0400, Dan Barker dbar...@visioncomm.net
wrote:
 How do I get multipathd to notice my drbd block devices?
 
 RHEL5 (Oracle VM 3.0.3.127, actually), 2.6.32.21-45xen.
 Drbd 8.4.1:
 multipath-tools says v0.4.9. I can't seem to find the multipath version.
 
 resource r0 {
 on OVMPam {
 volume 0 {
 device   /dev/drbd0 minor 0;
 disk /dev/sdb;
 meta-diskinternal;
 }
 address  ipv4 172.30.0.167:7789;
 }
 on DRPam {
 volume 0 {
 device   /dev/drbd0 minor 0;
 disk /dev/sdb;
 meta-diskinternal;
 }
 address  ipv4 172.30.0.170:7789;
 }
 startup {
 become-primary-on OVMPam;
 }
 }
 
 As-distributed global.
 
 All multipath commands reply Apr 09 08:10:50 | DM multipath kernel
driver
 not loaded, as if no devices were detected at boot time.
 
 I have drbd start before multipathd.
 
 Relax, I have no plans to multipath to this device, Oracle VM looks 
 only
in
 /dev/mapper for repository candidates.
 
 New 10 April: Ping! Anyone have any ideas? Even a different forum in
which
 to ask?
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] multipathd ignores my drbd0

2012-04-10 Thread Dan Barker
Yes. In fact I had a bit of bother to get multipath not to grab my DRBD
resources. The multipath.conf blacklist regexps didn't seem to work the way
I expected. I resorted to putting the scsi-id names in blacklisted.wwids. Of
course, that may mean that the drbd resource was blacklisted, but multipath
-ll -v3 has zero occurrences of drbd in it.

Dan

-Original Message-
From: Kushnir, Michael (NIH/NLM/LHC) [C] [mailto:michael.kush...@nih.gov] 
Sent: Tuesday, April 10, 2012 1:23 PM
To: Dan Barker; 'drbd List'
Subject: RE: [DRBD-user] multipathd ignores my drbd0

Did you check multipath blacklists?

-Michael 

-Original Message-
From: Dan Barker [mailto:dbar...@visioncomm.net]
Sent: Tuesday, April 10, 2012 12:40 PM
To: 'drbd List'
Subject: Re: [DRBD-user] multipathd ignores my drbd0

I've tried everything I can think of to get mapper to pick up the drbd0
device, and all failed. I put in a symlink (ln -s /dev/drbd0
/dev/mapper/DRBD0) but Oracle VM doesn't see it. I'm guessing it's requiring
multipath to access the resource.

I'll try some Logical Volumes and see if I can make Oracle VM see those. I
know I can build a PV/VG/LV from a drbd resource.

Running an iSCSI target and initiator on localhost on this machine is
getting a bit too weird, even for Oracleg.

Thanks for the tips!

Dan

-Original Message-
From: Kaloyan Kovachev [mailto:kkovac...@varna.net]
Sent: Tuesday, April 10, 2012 9:53 AM
To: Dan Barker
Cc: drbd List
Subject: Re: [DRBD-user] multipathd ignores my drbd0

I think you can't use DRBD device directly with multipath, but if you export
it via iSCSI and then import it back it is possible.

Another option is LVM over DRBD - device in /dev/mapper Another one is just
a symlink in /dev/mapper from udev rule

On Tue, 10 Apr 2012 09:41:13 -0400, Dan Barker dbar...@visioncomm.net
wrote:
 How do I get multipathd to notice my drbd block devices?
 
 RHEL5 (Oracle VM 3.0.3.127, actually), 2.6.32.21-45xen.
 Drbd 8.4.1:
 multipath-tools says v0.4.9. I can't seem to find the multipath version.
 
 resource r0 {
 on OVMPam {
 volume 0 {
 device   /dev/drbd0 minor 0;
 disk /dev/sdb;
 meta-diskinternal;
 }
 address  ipv4 172.30.0.167:7789;
 }
 on DRPam {
 volume 0 {
 device   /dev/drbd0 minor 0;
 disk /dev/sdb;
 meta-diskinternal;
 }
 address  ipv4 172.30.0.170:7789;
 }
 startup {
 become-primary-on OVMPam;
 }
 }
 
 As-distributed global.
 
 All multipath commands reply Apr 09 08:10:50 | DM multipath kernel
driver
 not loaded, as if no devices were detected at boot time.
 
 I have drbd start before multipathd.
 
 Relax, I have no plans to multipath to this device, Oracle VM looks 
 only
in
 /dev/mapper for repository candidates.
 
 New 10 April: Ping! Anyone have any ideas? Even a different forum in
which
 to ask?
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Unable to perform initial sync

2012-04-10 Thread Dan Barker
That is part of the story. You most likely have some protocol issues (thus
the log entries).

Why would you experiment with cross-version drbd? You should have the
easiest results with both servers at the same, recent level.

You would need to verify the kernel module and userland program versions on
both servers, the commands run, and the relevant dmesg logs from both sides
for folks to help you on this problem.

... BTW ...

Are you aware that if you do not care about the contents of the disks, you
don't have to sync all the zeros? You can create up-to-date disks instantly,
and then put a file system on it. Everything will be in sync. The first
verify will find a bunch of out-of-sync blocks, but they are in the
filesystem's free space and are synced by simply doing a
disconnect/reconnect on the secondary node. It really speeds up initial
setup, expecially with multi-terabyte resources. (See new-current-uuid in
http://www.drbd.org/users-guide-8.3/re-drbdsetup.html; syntax may be
different on 8.3 versions).

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Vyacheslav
Karpukhin
Sent: Tuesday, April 10, 2012 2:49 PM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Unable to perform initial sync


On 11.04.2012, at 0:36, Dan Barker wrote:

 I do not understand what you did or are trying to do. In Sync has no 
 direction. If you are in sync from Primary to Secondary, you are in  
 sync period. There is no reason to think about a direction.


I'm talking about direction because in my case sync from server B to server
A works fine, but from server A to server B -- fails.

Since it's initial sync of newly created volume, it doesn't matter for me
which of the hosts to make Primary, and which -- Secondary. So, that initial
sync may be performed in two directions -- from server A to server B, or
from server B to server A.

In my case when I mark server B as primary, everything is fine, drbd
synchronizes from B to A. But if instead I mark server A as primary,
synchronization won't perform -- there are protocol errors in the log.

I tried to use different versions of drbd, and found out that this issue
starts with 8.3.11. Right now drbd performs synchronization from A to B with
8.3.10, but I couldn't make it do that with 8.3.11, 8.3.12 and 8.4.1.

In my experiments each time I do the following:
1) create-md on both servers
2) starting the resource on both servers
3) marking one of the servers as primary
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] multipathd ignores my drbd0

2012-04-10 Thread Dan Barker
Yes, I've about come to the conclusion that multipath will not work with a
drbd resource, and Oracle VM will not work with a non-multipath resource.
So, no DRBD  LVM  Multipath  Oracle VM either. Dang!

Of course multipath will work with the underlying device, but then drbd
cannot replicate it except offline and without benefit of the activity log
or bitmap. Close to useless.

Looking like a second server (or SAN) is going to be required to run any
sort of DR other than occasional [incremental] backups managed manually.

Sigh, ...

-Original Message-
From: Pascal BERTON [mailto:pascal.bert...@free.fr] 
Sent: Tuesday, April 10, 2012 3:30 PM
To: 'Dan Barker'; 'drbd List'
Subject: RE: [DRBD-user] multipathd ignores my drbd0

Dan,

Isn't it the backing device you should refer ? To me, if you want multipath
to work correctly, the device has to return an ID. Having said that, if I
try it on my resource having the following config file :
resource vmfs2 {
device /dev/drbd1;
disk /dev/vg_vmfs2/lv_vmfs2;
meta-disk internal;

net {
}

disk {
}

on ipstore11 {
address 195.165.5.245:7791;
}

on ipstore21 {
address 195.165.5.247:7791;
}
}

If I try :
[root@ipstore21 ~]#  scsi_id --whitelisted --device=/dev/vg_vmfs2/lv_vmfs2
3600508b1001c3a60557b6713c82b915c
[root@ipstore21 ~]#  

Now, if I try :
[root@ipstore21 ~]#  scsi_id --whitelisted --device=/dev/drbd1
[root@ipstore21 ~]#  

As far as I understand, if no SCSI ID, multipathd can't identify the device,
so no multipath...
But configuring it with the backing device should work...

HTH.

Regards,

Pascal.


-Message d'origine-
De : drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] De la part de Dan Barker
Envoyé : mardi 10 avril 2012 21:08 À : 'drbd List'
Objet : Re: [DRBD-user] multipathd ignores my drbd0

Yes. In fact I had a bit of bother to get multipath not to grab my DRBD
resources. The multipath.conf blacklist regexps didn't seem to work the way
I expected. I resorted to putting the scsi-id names in blacklisted.wwids. Of
course, that may mean that the drbd resource was blacklisted, but multipath
-ll -v3 has zero occurrences of drbd in it.

Dan

-Original Message-
From: Kushnir, Michael (NIH/NLM/LHC) [C] [mailto:michael.kush...@nih.gov]
Sent: Tuesday, April 10, 2012 1:23 PM
To: Dan Barker; 'drbd List'
Subject: RE: [DRBD-user] multipathd ignores my drbd0

Did you check multipath blacklists?

-Michael 

-Original Message-
From: Dan Barker [mailto:dbar...@visioncomm.net]
Sent: Tuesday, April 10, 2012 12:40 PM
To: 'drbd List'
Subject: Re: [DRBD-user] multipathd ignores my drbd0

I've tried everything I can think of to get mapper to pick up the drbd0
device, and all failed. I put in a symlink (ln -s /dev/drbd0
/dev/mapper/DRBD0) but Oracle VM doesn't see it. I'm guessing it's requiring
multipath to access the resource.

I'll try some Logical Volumes and see if I can make Oracle VM see those. I
know I can build a PV/VG/LV from a drbd resource.

Running an iSCSI target and initiator on localhost on this machine is
getting a bit too weird, even for Oracleg.

Thanks for the tips!

Dan

-Original Message-
From: Kaloyan Kovachev [mailto:kkovac...@varna.net]
Sent: Tuesday, April 10, 2012 9:53 AM
To: Dan Barker
Cc: drbd List
Subject: Re: [DRBD-user] multipathd ignores my drbd0

I think you can't use DRBD device directly with multipath, but if you export
it via iSCSI and then import it back it is possible.

Another option is LVM over DRBD - device in /dev/mapper Another one is just
a symlink in /dev/mapper from udev rule

On Tue, 10 Apr 2012 09:41:13 -0400, Dan Barker dbar...@visioncomm.net
wrote:
 How do I get multipathd to notice my drbd block devices?
 
 RHEL5 (Oracle VM 3.0.3.127, actually), 2.6.32.21-45xen.
 Drbd 8.4.1:
 multipath-tools says v0.4.9. I can't seem to find the multipath version.
 
 resource r0 {
 on OVMPam {
 volume 0 {
 device   /dev/drbd0 minor 0;
 disk /dev/sdb;
 meta-diskinternal;
 }
 address  ipv4 172.30.0.167:7789;
 }
 on DRPam {
 volume 0 {
 device   /dev/drbd0 minor 0;
 disk /dev/sdb;
 meta-diskinternal;
 }
 address  ipv4 172.30.0.170:7789;
 }
 startup {
 become-primary-on OVMPam;
 }
 }
 
 As-distributed global.
 
 All multipath commands reply Apr 09 08:10:50 | DM multipath kernel
driver
 not loaded, as if no devices were detected at boot time.
 
 I have drbd start before multipathd.
 
 Relax, I have no plans to multipath to this device, Oracle VM looks 
 only
in
 /dev/mapper for repository candidates.
 
 New 10 April: Ping! Anyone have any ideas? Even a different forum in
which
 to ask?
 
 ___
 drbd-user mailing list
 drbd

[DRBD-user] multipathd ignores my drbd0

2012-04-09 Thread Dan Barker
How do I get multipathd to notice my drbd block devices?

RHEL5 (Oracle VM 3.0.3.127, actually), 2.6.32.21-45xen.
Drbd 8.4.1:
multipath-tools says v0.4.9. I can't seem to find the multipath version.

resource r0 {
on OVMPam {
volume 0 {
device   /dev/drbd0 minor 0;
disk /dev/sdb;
meta-diskinternal;
}
address  ipv4 172.30.0.167:7789;
}
on DRPam {
volume 0 {
device   /dev/drbd0 minor 0;
disk /dev/sdb;
meta-diskinternal;
}
address  ipv4 172.30.0.170:7789;
}
startup {
become-primary-on OVMPam;
}
}

As-distributed global.

All multipath commands reply Apr 09 08:10:50 | DM multipath kernel driver
not loaded, as if no devices were detected at boot time.

I have drbd start before multipathd.

Relax, I have no plans to multipath to this device, Oracle VM looks only in
/dev/mapper for repository candidates.

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] How to create WFConnection resource with one DRBD host?

2012-04-07 Thread Dan Barker
I need to test that DRBD will peacefully cohabit with Oracle VM. I want to
build a single-node DRBD array and need my resource in WFConnection
Primary/unknown Uptodate/DUnknown.

Otherwise I need two drbds (I can do that, but it's not germane to this
test).

I did a create-md (successful) and a drbdadm new-current-uuid  minor-0
(failed):

[  377.048406] d-con r0: conn( WFConnection - Disconnecting )
[  377.048416] d-con r0: Discarding network configuration.
[  377.048559] d-con r0: Connection closed
[  377.048561] d-con r0: conn( Disconnecting - StandAlone )
[  377.048563] d-con r0: receiver terminated
[  377.048576] d-con r0: Terminating receiver thread
[  377.048594] block drbd0: disk( Inconsistent - Failed )
[  377.057027] block drbd0: disk( Failed - Diskless )
[  377.057132] block drbd0: drbd_bm_resize called with capacity == 0
[  377.057229] d-con r0: Terminating worker thread

8.4.1

What did I miss?



___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd storage for Oracle VM

2012-04-04 Thread Dan Barker
 Eduardo, thank you for your input, but Virtual Box (workstation
 product) is a completely different animal than Oracle VM (bare-metal
 hypervisor). I don't get to choose the underlying storage, and OVM3
choses OCFS2.

 I guess I'll simply have to try it out and see if DRBD will
compile/run/etc.

 It will. However OCFS2 will only work on Protocol C (dual-Primary
 mode requires synchronous replication by definition), so your
 original plan of using protocol A or B is out.

 Why do you need to compile though? I thought UEK2 shipped with DRBD.
 Or has that not been made available for Oracle VM yet?

 Florian

 That sounds encouraging. Not having to compile will mean I don't have to
locate all the dependencies.

 UEK2 shipped with DRBD. How would I check?

 modinfo drbd; which drbdadm, or yum search drbd.

 I had already tried drbdadm --help and got command not found. Modinfo
came up with could not find module drbd. yum requires a repository to do
anything. I tried
http://public-yum.oracle.com/repo/OracleLinux/OL5/latest/x86_64/ but drbd
wasn't there. So I tried to compile. I found gcc, make and flex, but am
coming up empty on linux-headers-2.6.32.21-45xen. I don't believe I can
build a drbd without the headers. Any ideas? (I also posted the question on
the Oracle VM forum at forums.oracle.com)

 My commands:
 ./configure --with-km --prefix /usr --sysconfdir /etc --localstatedir /var
 make clean all

 Result:
  Userland tools build was successful.
  SORRY, kernel makefile not found.
  You need to tell me a correct KDIR,
  Or install the neccessary kernel source packages.

 (as expected). Any ideas on finding kernel headers?

 I have no need for dual-primary and I can't see how OCFS2 can even tell
if the underlying block device is dual-primary or single-primary.

 Because the device wouldn't be writable on one of the nodes, which
 OCFS2 is sure to choke on?

 Well, before a failure, xen won't even be running. I don't kwow if OCFS2
will need write permissions on a volume not mounted. I'd expect not. The
stand-by node will bring up drbd secondary and xen not at all. Failover will
include assuring the primary node is down or at least offline, 'drbdadm
primary r0' at the warm site, and then start xen (which will mount the OCFS2
volume as part of its initialization - at least that's what I'm guessing
will work. I can't test it until I can get the kernel modules compiled.

 I believe Eduardo was using dual-primary. I simply need to do a
warm-failover (manual) in the event of a disaster.

 Probably need to do OCFS2-on-iSCSI with iSCSI-on-DRBD then.
 Florian

 I don't see the need for that level of complexity. Before we'd do that
many levels of 
 misdirection on a single host (SAS/Raid/DRBD/IET/OVM/OCFS2), I think a SAN
would be a 
 better choice. My goal is to get this running in a single box (per site).

I got an answer from forums.oracle.com. There is a development platform for
this sort of thing. Since I heard nothing from the DRBD community, I'll
assume I'm in new territory. I'll report back after I build a drbd kernel
rpm (or the 3.1.1 Oracle VM comes out with drbd built in - it's to use the
new UEK2 kernel).

REF: https://forums.oracle.com/forums/thread.jspa?threadID=2368799


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd storage for Oracle VM

2012-04-03 Thread Dan Barker
Eduardo, thank you for your input, but Virtual Box (workstation product) is a 
completely different animal than Oracle VM (bare-metal hypervisor). I don’t get 
to choose the underlying storage, and OVM3 choses OCFS2.

 

I guess I’ll simply have to try it out and see if DRBD will compile/run/etc.

 

Dan

 

From: Eduardo Diaz - Gmail [mailto:ediaz...@gmail.com] 
Sent: Tuesday, April 03, 2012 5:44 AM
To: Dan Barker
Cc: drbd List
Subject: Re: [DRBD-user] drbd storage for Oracle VM

 

I am using Active/Active drbd with virtualbox ocfs2, if you have any questions 
feel free to ask.

My experience is if you don't need ocfs2 don't use... is better use reiserfs or 
xfs and  XEN (if you don't know abour virtualization you can use virtualbox, it 
is more easy).

If you need more info fell free contact by chat in gmail.

regards!



On Sat, Mar 31, 2012 at 7:57 PM, Dan Barker dbar...@visioncomm.net wrote:

I'm looking at a Primary/Secondary, Protocol B (or maybe A with Proxy) for a
warm-DR site of a one-host, 4VM Oracle VM 3.0.3 system.

Anyone got any experience with OVM 3 and DRBD?

I was planning to put DRBD on the OVM 3 host and have it provide /dev/drbd0
to OVM for its repository (OVM3 Repositories are OCFS2, in case you are not
familiar).

I have not even checked that it will compile, but if someone has done it
before that would be good to know.

Specific hardware is a dual-XEON 5690 box with an RAID 1+0 array Areca 1222
(multi-terabyte), 4 Gig NICs and 96G Ram in a 2U enclosure.

Dan, in Atlanta

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

 

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd storage for Oracle VM

2012-04-03 Thread Dan Barker
 Eduardo, thank you for your input, but Virtual Box (workstation
 product) is a completely different animal than Oracle VM (bare-metal
 hypervisor). I don't get to choose the underlying storage, and OVM3 choses 
 OCFS2.



 I guess I'll simply have to try it out and see if DRBD will compile/run/etc.

 It will. However OCFS2 will only work on Protocol C (dual-Primary mode 
 requires
 synchronous replication by definition), so your original plan of using 
 protocol
 A or B is out.

 Why do you need to compile though? I thought UEK2 shipped with DRBD.
 Or has that not been made available for Oracle VM yet?

 Florian


That sounds encouraging. Not having to compile will mean I don't have to locate 
all the dependencies. 

UEK2 shipped with DRBD. How would I check?

I have no need for dual-primary and I can't see how OCFS2 can even tell if the 
underlying block device is dual-primary or single-primary. I believe Eduardo 
was using dual-primary. I simply need to do a warm-failover (manual) in the 
event of a disaster.

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] drbd storage for Oracle VM

2012-03-31 Thread Dan Barker
I'm looking at a Primary/Secondary, Protocol B (or maybe A with Proxy) for a
warm-DR site of a one-host, 4VM Oracle VM 3.0.3 system.

Anyone got any experience with OVM 3 and DRBD?

I was planning to put DRBD on the OVM 3 host and have it provide /dev/drbd0
to OVM for its repository (OVM3 Repositories are OCFS2, in case you are not
familiar).

I have not even checked that it will compile, but if someone has done it
before that would be good to know.

Specific hardware is a dual-XEON 5690 box with an RAID 1+0 array Areca 1222
(multi-terabyte), 4 Gig NICs and 96G Ram in a 2U enclosure.

Dan, in Atlanta

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] corrupted data disk

2012-03-16 Thread Dan Barker
Prevent? Good applications.
Recover? Good backups.

-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Umarzuki Mochlis
Sent: Friday, March 16, 2012 2:32 AM
To: drbd-user
Subject: [DRBD-user] corrupted data disk

will a corrupted data/filesystem on primary will also get copied to
secondary (garbage in, garbage out?) on drbd 8.3?
if yes, how to prevent this?

--
Regards,

Umarzuki Mochlis
http://debmal.my
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] centos 5 drbd 8 uptodate is false

2012-03-15 Thread Dan Barker
You don't have to shut down the VMs, but you have to have adequate storage.

Build a NEW dbrd array.
Migrate a your VMs to it.
Delete the old array.

Depending on your storage and VMs, you may be able to stage this in several
steps so you don't need all the storage to be duplicated. Assuming you have
VMa, VMb and VMc on Disk1, and VMd, VMe and VMf on Disk2, you could.

Build a drbd array on New space.
Migrate VMa, VMb and VMc to it.
Build a drbd array on the space of Disk1.
Migrate VMd, VMe and VMf to it.
Recover the space of Disk2.

hth

Dan Barker



-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Adi Spivak
Sent: Thursday, March 15, 2012 7:24 AM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] centos 5 drbd 8 uptodate is false

Ok, i now understand what i did wrong.
I tried to use drbd as a solution for a production server without turning
any on my VM off, however, i understart that this might not be the solution
i am looking for right now, as this is for a new setup only or if i am
willing to turn off the VMs...

For anyone that is interested i use an iscasi device to export the LVM
drives to the XEN server.
This solution will force me to export the DRBD devices out to XEN instead of
the Lvms and rebooting all of my servers and changing all of my config files
to reflect the new setup to support DRBD.


I will try to look for some other solution as turning off the production
servers is a bit of an issue...

Thank you all...


On 03/15/2012 12:56 PM, Andreas Kurz wrote:
 On 03/15/2012 09:20 AM, Adi Spivak wrote:
 Hi.
 I ma using the drbd that comes with centos.
 My centos version is 5.8

 My /proc/drbd say as following:

 version: 8.0.16 (api:86/proto:86)
 GIT-hash: d30881451c988619e243d6294a899139eed1183d build by 
 mockbu...@v20z-x86-64.home.local, 2009-08-22 13:27:08

  1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
 ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
 resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
 act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 
 changed:0

 
 This is definitely an unused device ... no writes since started. And 
 please don't use that ancient version but the latest 8.3.12 from 
 Centos Extras repo.
 
 Whatever device you are using in your Xen VMs it is not /dev/drbd1. 
 But if you show us your Xen VM config and the output of lvs command 
 it should be more clear what is going on.
 
 Regards,
 Andreas
 
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd read-only mode

2012-03-15 Thread Dan Barker
ro: doesn't mean Read Only in that context. I don't recall offhand what it
does mean, but my disks are all read-write and show ro: there.

 

Someone else will chime in with the real meaning, I'm sure.

 

Dan

 

From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of ??
Sent: Thursday, March 15, 2012 10:22 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] drbd read-only mode

 

Hi

I installed 8.4.1 drdb.

The cluster operates on a primary/primary mode.

However, the drives are mounted in the mode of read-only

[root@noc-1-m77 /]# cat /proc/drbd

version: 8.4.1 (api:1/proto:86-100)

GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by
r...@noc-1-synt.rutube.ru, 2012-03-14 10:05:49

0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-

  ^^

ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

 

Why not activate the read-write mode?

 

Config:

[root@noc-1-synt /]# cat /etc/drbd.d/r0.res

# create new

resource r0 {

  startup {

wfc-timeout 20;

degr-wfc-timeout 10;

# we will keep this commented until tested successfully:

become-primary-on both;

  }

 net {

protocol C;

allow-two-primaries;

after-sb-0pri discard-zero-changes;

after-sb-1pri discard-secondary;

after-sb-2pri disconnect;

 

}

# DRBD device

device /dev/drbd0;

# phisical device

disk /dev/vg_noc1synt/lv02;

meta-disk internal;

on noc-1-synt.rutube.ru {

   # IP address:port

   address 10.1.20.10:7788;

}

on noc-1-m77.rutube.ru {

   address 10.2.20.9:7788;

}

}

[root@noc-1-synt /]#

 

 

 

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Kernel hung on DRBD / MD RAID

2012-02-21 Thread Dan Barker
 On 02/21/2012 12:03 AM, Andreas Bauer wrote:
  So when vm-master is Primary, vm-slave is Secondary, and I 
  force-detach the
 backing device on vm-master, DRBD will automatically make vm-slave the 
 Primary and direct writes to that host?
 
 no.
 
 The secondary remains secondary. However, the primary cannot write to 
 its local disk (seeing as it is detached), so writes are done *only* 
 on the secondary (normally, any write is done on both nodes).
 
 When the diskless node eventually gets access to a backing storage 
 device again (the old disk or a new one), it resyncs with the UpToDate 
 one and you're back to normal.
 
 Of course, if you loose connectivity or the peer's disk, you're down 
 to no disk, and therefore out of operation.
 
 HTH,
 Felix

 Thanks, it does help. So a node can be Primary and inconsistent while 
 the opposite node is Secondary and UpToDate and read requests are
(necessarily) 
 satisfied over network. Didn't know that.

 regards,

 Andreas

and Writes!

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Backing up VMware VM's running DRBD with VMware Data Recovery (VDR)

2012-02-20 Thread Dan Barker
Mark said: Snapshot creation requires that VMware tools are installed.
This is not correct (at least on ESXi4 and 5).

quiesce VM requires VMware tools, VDR may require VMware tools, but
snapshots do not.

This doesn't affect the OP's problem, but the incorrect statement needed
illumination.

Dan in Atlanta


-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Pascal BERTON
(EURIALYS)
Sent: Friday, February 17, 2012 6:52 AM
To: 'Mark Watts'; drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Backing up VMware VM's running DRBD with VMware
Data Recovery (VDR)

Mark,

Snapshot creation requires that VMware tools are installed, which brings the
sync driver onboard. Cannot quiesce VM means it is unable to contact the
sync driver. Have you installed the VMware Tools ?

Best regards,

Pascal.

-Message d'origine-
De : drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] De la part de Mark Watts
Envoyé : vendredi 17 février 2012 10:29 À : drbd-user@lists.linbit.com
Objet : [DRBD-user] Backing up VMware VM's running DRBD with VMware Data
Recovery (VDR)

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


I have a pair of CentOS 5.7 VM's running an LVM/DRBD/EXT3 Pri/Sec cluster.

Since we use VDR to take snapshots of our VM's I naturally added these two
VM's to the backup rota.

Pretty much every night I get hundreds of errors in the VDR logs relating to
the Primary, giving the message:

Failed to create snapshot for VDR01, error -3960 ( cannot quiesce virtual
machine)


I'm taking a wild guess at this perhaps being related to DRBD; can anyone
suggest whether this is the issue?


Mark.

- --
Mark Watts BSc RHCE
Senior Systems Engineer, MSS Secure Managed Hosting www.QinetiQ.com QinetiQ
- Delivering customer-focused solutions GPG Key:
http://www.linux-corner.info/mwatts.gpg
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (GNU/Linux)
Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org/

iEYEARECAAYFAk8+HdoACgkQBn4EFUVUIO2jEgCfecWw3SEZBS8Of0QWdeDII4Kw
sksAn13yrWX7+7Nz1katOuUfH/5hUbO8
=IQhD
-END PGP SIGNATURE-
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD+Pacemaker: Won't promote with only one node

2012-01-05 Thread Dan Frincu
Hi,

On Wed, Jan 4, 2012 at 10:10 PM, William Seligman
selig...@nevis.columbia.edu wrote:
 I'll give the technical details in a moment, but I thought I'd start with a
 description of the problem.

 I have a two-node active/passive cluster, with DRBD controlled by Pacemaker. I
 upgraded to DRBD 8.4.x about six months ago (probably too soon); everything 
 was
 fine. Then last week we did some power-outage tests on our cluster.

 Each node in the cluster is attached to its own uninterruptible power supply;
 the STONITH mechanism is to turn off the other node's UPS. In the event of an
 extended power outage (this happens 2-3 times a year at my site), it's likely
 that one node will STONITH the other when the other node's UPS runs out of 
 power
 and shuts it down. This means that when power comes back on, only one node 
 will
 come back up, since the STONITHed UPS won't turn on again without manual
 intervention.

 The problem is that with only one node, Pacemaker+DRBD won't promote the DRBD
 resource to primary; it just sits there at secondary and won't start up any
 DRBD-dependent resources. Only when the second node comes back up will 
 Pacemaker
 assign one of them the primary role. I've confirmed this by shutting down
 corosync on both nodes, then bringing it up again on just one of them.


Could you also post your Pacemaker configuration?

Also you might want to check
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#id890288
for no-quorum-policy, in two-node clusters, losing one node means you
don't have quorum, and unless you something else as a quorum device,
then the policy is set to stop.

HTH,
Dan

 I'm pretty sure that this is due to a mistake Ive made in made in my DRBD
 configuration when I fiddled with it during the 8.4.x upgrade. I've attached 
 the
 files. Can one of you kind folks spot the error?

 Technical details:

 Two-node configuration: hypatia and orestes
 OS: Scientific Linux 5.5, kernel 2.6.18-238.19.1.el5xen
 Packages:
 drbd-8.4.1-1
 corosync-1.2.7-1.1.el5
 pacemaker-1.0.12-1.el5.centos
 openais-1.1.3-1.6.el5

 Attached: global_common.conf, nevis-admin.res

 --
 Bill Seligman             | Phone: (914) 591-2823
 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
 PO Box 137                |
 Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/

 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user




-- 
Dan Frincu
CCNA, RHCE
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Is VMFS5 cluster aware?

2011-11-30 Thread Dan Barker
Is the new filesystem for VMWare's vSphere 5 cluster aware? I continue to
see references to GFS2 and OCFS2 but never a mention of VMFS5. I'm just
curious for now, because I'm running single primary. If I could save the
network latency for reads from one host that would be sweet!

Dan

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Enabling a stacked resource - HELP!

2011-09-15 Thread Dan Barker
And on 8.4.0 up, you can use the same IP and port. See volumes: 
http://www.drbd.org/users-guide/ap-recent-changes.html

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of SlingPirate
Sent: Thursday, September 15, 2011 8:28 AM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Enabling a stacked resource - HELP!


For anyone using google and stuck like I was, you can use the same net card 
just need diferent ports assigned to each resource.


SlingPirate wrote:
 
 Thanks for your reply Matthieu, I'm now trying it as seperate DRBD's 
 for each of the VMs as live migration would be nice but it's not a necessity.
 
 Only now I seem to be running into a problem of not being able to use 
 the same network card for multiple DRBD resources. I'm hoping that I 
 won't have to use a different IP's for each resource!
 
 
 
 
 Matthieu Labbé wrote:
 
 Hello,
 
 2011/9/13 SlingPirate sling.t...@googlemail.com:

 Thank you for your input Lars, and apologies for my lack of 
 understanding.  I have done some more reading and I am still 
 somewhat unsure as how to approch this. I will try to re-explain 
 what I'm after and I would be grateful for any more of your time and 
 guidance.

 * On-site **** Off-site *** = 
 NODE-1 == NODE-2 =  = NODE-DR = =   VM-1   
 ==   VM-1   =  =   VM-1 = =   VM-2   
 ==   VM-2   =  =   VM-2 = =   VM-3   
 ==   VM-3   =  =   VM-3 = =   VM-4   
 ==   VM-4   =  =   VM-4 =

 Under normal conditions I would like to run VM-1  VM-2 on NODE-1 
 while
 VM-3
  VM-4 run on NODE-2.

 If NODE-1 dies NODE-2 runs all the VMs or visa-versa. If the site 
 that
 NODE-1  NODE-2 are based at blows up, NODE-3 is ready to run all VMs.

 I can use either block devices or image files for the VMs.

 What would you suggest would be the correct way to set this up?

 
 You need more than one DRBD, for example:
 DRBD0 for VM-1  VM-2, Primary on NODE-1, Secondary on NODE-2
 DRBD1 for VM-3  VM-4, Primary on NODE-2, Secondary on NODE-1 or use 
 one DRBD per VM as suggested before.
 
 You only need dual primary if you want to do live migration between
 NODE-1 and NODE-2 and in that case the classic 3-way setup doesn't 
 work.
 If you don't need live migration, do classic 3-way setups.
 
 Hope that help,
 Matt.
 
 --
 Matthieu Labbé
 http://mattlabbe.com/
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 
 
 
 

--
View this message in context: 
http://old.nabble.com/Enabling-a-stacked-resource---HELP%21-tp32423513p32471169.html
Sent from the DRBD - User mailing list archive at Nabble.com.

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRDB + OCF Active Active

2011-09-08 Thread Dan Frincu
Hi,

On Wed, Sep 7, 2011 at 10:28 PM, Nick Khamis sym...@gmail.com wrote:
 Hello Everyone,

 We are looking to setup write intesive services using database
 technologies. Doing some research I yielded
 the attached document. Is there a issue in terms of performance using
 DRBD on an active active, and say
 mysql database? That being said, what is the best combination for
 clustering using DRBD:

 OCFS2 active active
 EXT3 active passive

The second one.


 On top of HA, load balancing is also important to us.

The document also detailed MySQL Cluster using the ndb engine. There
are some benefits for using such a solution, but also downsides,
therefore it is best to evaluate all of your requirements and see
which fits best.

On the MySQL Cluster with ndb approach, you have to assess what will
be the database/s size/s, what would be an estimated growth per week,
month, year, and plan your hardware requirements accordingly, as well
as plan for expansion, as ndb is an in-memory database, which on MySQL
Cluster scales to multiple nodes by partitioning the database and
having a primary copy of a partition on a node, and one or more
(minimum is one) backup copy of the same partition on another node
(all stored in RAM). As more nodes are added to the MySQL Cluster, the
partitions are split further and replicated onto new nodes as well, it
allows linear scaling iirc.

Every node maintains a transaction log on disk, therefore allowing
recovery of a node based on this log. However, a node failure does not
 lead to service interruption, as there always is at least one other
node maintaining a backup copy of the partition of the failed node in
memory. When failure is detected, the node keeping the backup copy
makes his copy primary and sets a new node to keep a backup copy.
Also, all transactions are performed atomically via a two-phase commit
protocol.

MySQL Cluster usually does not imply the usage of another clustering
technology on the same nodes, and given the high memory consumption,
it's usually not the case to mix things where it isn't needed. One
possible scenario would be to have all writes done on the MySQL
Cluster, and from it have N frontends set up as Replication Slaves for
Read requests. Load balancing writes can be done by having a frontend
issue requests to each Data Node, but it's recommended that requests
are sent to the DC (IIRC) and it will (based on whom has the writeable
copy of a partition) send the request to the Data Node storing it,
then that Node will send the reply to the frontend.

There are multiple scenarios possible, but they usually involve having
writes performed on the MySQL Cluster (it supports simultaneous access
for read/write operations on every node via the two-phase commit
protocol) and having read requests either on the Cluster or on
Replication Slaves with the second option being the recommended one.

MySQL Cluster holds all the databases in memory, so it's very fast,
has self healing capabilities, built-in high availability, it can also
use all of the CPU cores in a system and it relies on network
transport for communication between nodes, thus one can upgrade the
interconnects to Infiniband or some other solution for maximum
performance.

In terms of planning for MySQL Cluster, the following links will give
some insight:
http://www.severalnines.com/sizer/
http://www.fromdual.ch/mysql-cluster-memory-sizing

The only use case for DRBD in a MySQL Cluster would be to also
replicate the logs that it flushes to disk. In the event of a node
failure, the cluster is still fine, as explained above, but restoring
a node might take some time (get the log from the failed node, copy it
to a new node, load it into RAM, join the node to the cluster, the
node updates it's data to match the cluster state || fix the node,
restore or not the logs, load them into RAM, etc.). By having the logs
on a DRBD partition, and replicating it to another server, the data is
still available, if you have a spare server, then it's just a matter
of promoting the DRBD partition to Primary, mounting it, then export
it via NFS or whatever, mount it on the spare server, start loading
the logs into RAM, join the node to the Cluster. It reduces MTTR.

Hope it sheds some light onto the picture.

Regards,
Dan


 Thanks in Advnace,

 Nick

 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user





-- 
Dan Frincu
CCNA, RHCE
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Latency problems with DRBD 0.8.4 and a 3ware RAID5

2011-08-31 Thread Dan Barker
I sure hope that 0.8.4 is a typo. We are up to version 8.4.0. If you are really 
running drbd from years ago, I'd suggest an upgrade. Sorry, but I don't have 
any input on your throughput issue.

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Florian Apolloner
Sent: Wednesday, August 31, 2011 7:46 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] Latency problems with DRBD 0.8.4 and a 3ware RAID5

Hi,

I am having a horrible latency with the following configuration:

 * protocol c
 * raid5 over a 3Ware 9750-4i SAS2 raid controller
 * Dedicated Gigabit link between the two machines, no switch in between.

This is what I can tell you:

 * iperf shows around 950 Mbits/sec -- sounds okay for gigabit ;)
 * throughput for the drbd raid has 70-80 MByte/sec -- sounds good to me too.

Now on to the latency tests (backing device):

dd if=/dev/zero of=/dev/sda3 bs=512 count=1000 oflag=direct
512000 bytes (512 kB) copied, 0.0365779 seconds, 14.0 MB/s

and onto the drbd device:

dd if=/dev/zero of=/dev/drbd1 bs=512 count=1000 oflag=direct
512000 bytes (512 kB) copied, 9.651 seconds, 53.1 kB/s

Sounded awfully slow to me, so I compared with a ramdisk:

dd if=/dev/zero of=/dev/drbd2 bs=512 count=1000 oflag=direct
512000 bytes (512 kB) copied, 0.166689 seconds, 3.1 MB/s

Sounds wy better, though I can't tell if that's slow or fast enough.

The RTT for the link is:
rtt min/avg/max/mdev = 0.096/0.166/0.207/0.043 ms

So let's summarize:

- Harddisk speed shouldn't be a problem (0.0365779 seconds for the 
- write) Network speed should be a problem as indicated by RTT and 
- iperf

- Network and Harddisk combined - problems.

Any hints on how to debug that? I am currently grasping any straw I can get :/

Thx in advance and regards,
Florian Apolloner


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] data integrity in drbd

2011-08-26 Thread Dan Barker
Reading between the lines on this thread, I think you have mixed access paths, 
and now believe that drbd is somehow involved in your troubles.

It is perfectly correct to do what you attempted, IE: Break the mirror, test 
some process, reestablish the mirror using the original data, and continue on.

Assuming the drbd resource r0 as /dev/drbd0, stored on /dev/sdb1 on nodeA 
(Primary) and nodeB (Secondary):

This scenario would be accomplished by:

On NodeB:
=
drbdadm disconnect r0
drbdadm primary r0
mount /somewhere /dev/drbd0

... do your test on nodeB ...

umount /dev/drbd0
drbdadm secondary r0
drbdadm -- --discard-my-data connect r0

and the nodes will sync up to the original data. The main node, nodeA always 
has the resource as primary, and goes from Connected to WFConnection to 
SyncSource to Connected. No access to the physical device, /dev/sdb1 is done by 
any process other than drbd. You never stop drbd on either node.

I'm guessing that at some point you mounted /dev/sdb1 instead of /dev/drbd0 and 
that is the source of your problems; Updates occurred that drbd did not see. 
Using the full disk rather than a partition (/dev/sdb instead of /dev/sdb1 in 
this case) could assist in preventing you from shooting yourself in the foot. 
But, you can ALWAYS shoot yourself in the foot.

I do not understand your comment I shut down drbd, try to use rsync to 
recovery the snapshot, but failed It sounds like there is a Logical Volume 
Manager involved, or some other underlying block device that supports snapshots.

Again, drbd can only replicate changes to a block device that occur while drbd 
is handling that device. The observations above can be applied to any block 
device. If you expect drbd to be able to handle the block device, it must be 
the only access to that device; Connected or not, Primary or Secondary. 

hth

Dan Barker

-Original Message-
From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Arnold Krille
Sent: Friday, August 26, 2011 2:08 PM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] data integrity in drbd

On Friday 26 August 2011 19:50:29 you wrote:
 In my situation, what should I do to completely resync the data of 
 secondary node, including the drbd metadata ?

Delete the secondaries disk, create the meta-data anew and make it sync 
completely from the primary. And if something is still wrong in the files on 
disk, restore them from the backup...

(And don't ask public questions in private:)

Have a nice weekend,

Arnold

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD won't take 1G syncer rate

2011-08-05 Thread Dan Barker
You are still mixing megabits and megabytes. Your 1000 megabit pipe won't
take a 600 megabyte stream, or a 150 megabyte stream. The maximum is about
125 MBps.

 

DRBD talks (and is documented to talk) bytes. Most everyone else talks bits.

 

You don't mention the speed you are getting.

 

Also, if you have 3 resources syncing, each will try for the syncer limit.
So, to use 50% of your capacity to sync 3 resources, you'd specify the rate
as 21M. Note: you can change the rate on the fly, during a sync.

 

Dan 

 

From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Matt Baer
Sent: Friday, August 05, 2011 8:51 AM
To: Caspar Smit
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate

 

I was playing with the settings yesterday and it let me set it at 600M and
it didn't make a difference in the sync speed at all.  I then tried it with
your suggestion, dropped it to 150M just to be safe.  Still no difference.
I wonder what the deal is.  Could it be that this is the initial sync?

 

On Aug 5, 2011 1:29 AM, Caspar Smit c.s...@truebit.nl wrote:
 Hi Matt,
 
 1000M means 1000 Mb/s NOT 1000mbps. To reach 1000M you should have at
least
 one (probably two) 10gbit interface(s). Since you have two 1gbit
interfaces
 (bonded with balance-rr?) a value between 100M and around 170M would be
more
 appropiate.
 
 Kind regards,
 Caspar
 Op 5 aug. 2011 08:21 schreef Matt Baer mb...@lrnet1.com het volgende:
 When setting the syncer rate in drbd.conf to 1G, it won't start, citing
 that
 1G is invalid. Get the same thing with 1000M. Any clue as to why? It
 explicitly states that mb...@lrnet1.com1G is acceptable in the docs.
 I've
 triple checked and both interfaces are auto-negotiated at 1000mbps full
 duplex.

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD won't take 1G syncer rate

2011-08-05 Thread Dan Barker
We've been off-list for a few messages, but this is now interesting enough
to be public. I apologize for the top-posting. Please read backwards. Dan

 

Well, to be certain, I'd dd the disks to zeros individually, and then start
with them sync'd. 

 

drbdadm down all

dd if=/dev/zero of=/drbdbackingdevice bs=1M oflag=direct

 

on both sides.

 

New Blank Disk:

===

#On both nodes, initialize meta data and configure the device.

drbdadm -- --force create-md r0

 

#They need to do the initial handshake, so they know their sizes.

drbdadm up r0

 

#They are now Connected Secondary/Secondary Inconsistent/Inconsistent.
Generate a new current-uuid and clear the dirty bitmap.

drbdadm -- --clear-bitmap new-current-uuid r0

 

#They are now Connected Secondary/Secondary UpToDate/UpToDate.

drbdadm primary r0

 

Now, recreate your empty ext3 file system and you are in sync.

 

Dan

 

From: Matt Baer [mailto:mb...@lrnet1.com] 
Sent: Friday, August 05, 2011 11:26 AM
To: Dan Barker
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate

 

Well we're the perfect supplements for each other because, as you can see, I
don't know DRBD from a hole in the wall.

Yes, it's a brand new resource.  I don't think I would say it's full of
zeros, as it has a clean ext3 file system on it.

Yes, I would LOVE to skip the sync, I've been dealing with this for weeks
now and right when I was about to go live, I tested the failover and it
didn't work because of a service heartbeat wanted to start wasn't going all
that well.  While troubleshooting, I lost my 100% perfectly live server and
have to start from scratch.  Problem is I only have two days to do it and
the thing has to sync 1.8TB at 12MB/s.  I have no idea where the bottleneck
could be.  Two servers, a cable connecting eth1 to eth1, both are
auto-negotiated at 1gbps on their own /30 subnet.  The only thing there
would be garbage NIC cards, possible, but not probable, or the cable, more
likely, but I've never had an issue with it until now.  Freshly constructed
servers, too.

I tried the drbdsetup /dev/drbd0 syncer -r 120M, been running like that for
about 5 minutes now and it hasn't changed at all.



On Fri, Aug 5, 2011 at 10:13 AM, Dan Barker dbar...@visioncomm.net wrote:

Is this a brand new resource? Why are you doing a full sync? If it's brand
new (full of zeros), you can skip the sync. Instructions upon request.

 

Btw, I don't know why you are getting 12% of your requested syncer rate. I'm
not a hot-shot linux performace analyzer, but there is a bottleneck
somewhere. I get 25M routinely here on GB nics. I have my Syncer set to 25M.
It drops to about 14M (each) if 2 are syncing.

 

To change sync rate without stop/start drbd: drbdsetup /dev/drbd1 syncer -r
120M

 

AL Extents seems a bit low. I use 1801 (big prime number that felt about
right).

 

Dan

 

 

 

 

From: Matt Baer [mailto:mb...@lrnet1.com] 
Sent: Friday, August 05, 2011 11:03 AM
To: Dan Barker


Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate

 

Ok, revised /etc/drbd.conf and restarted DRBD with the following

common { syncer { rate 100M; al-extents 257; } }


And I'm getting from /proc/drbd:

GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
buildsvn@c5-x8664-build, 2008-10-03 11:30:17
 0: cs:SyncTarget st:Secondary/Primary ds:Inconsistent/UpToDate C r---
ns:0 nr:281056 dw:272864 dr:0 al:0 bm:16 lo:257 pe:1969 ua:256 ap:0
oos:1308428488
[] sync'ed:  0.1% (1277762/1278028)M
finish: 25:57:39 speed: 13,904 (12,400) K/sec

And I only have one resource, r0.  All it's syncing right now is the post
mkfs.ext3 /dev/drbd0



On Fri, Aug 5, 2011 at 9:51 AM, Dan Barker dbar...@visioncomm.net wrote:

You are still mixing megabits and megabytes. Your 1000 megabit pipe won't
take a 600 megabyte stream, or a 150 megabyte stream. The maximum is about
125 MBps.

 

DRBD talks (and is documented to talk) bytes. Most everyone else talks bits.

 

You don't mention the speed you are getting.

 

Also, if you have 3 resources syncing, each will try for the syncer limit.
So, to use 50% of your capacity to sync 3 resources, you'd specify the rate
as 21M. Note: you can change the rate on the fly, during a sync.

 

Dan 

 

From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Matt Baer
Sent: Friday, August 05, 2011 8:51 AM
To: Caspar Smit
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate

 

I was playing with the settings yesterday and it let me set it at 600M and
it didn't make a difference in the sync speed at all.  I then tried it with
your suggestion, dropped it to 150M just to be safe.  Still no difference.
I wonder what the deal is.  Could it be that this is the initial sync?

 

On Aug 5, 2011 1:29 AM, Caspar Smit c.s...@truebit.nl wrote:
 Hi Matt,
 
 1000M means 1000 Mb/s NOT 1000mbps. To reach 1000M you should have at
least

Re: [DRBD-user] DRBD won't take 1G syncer rate

2011-08-05 Thread Dan Barker
OK, You can skip the zero, but the devices won't pass an online verify. It
won't hurt anything. All the sectors that need to be synced will be synced.
The disks are mostly zeros, so an online verify wouldn't do much, but it's
nice to know.

 

To find out how long the DD will take, kill -USR1 taskid (frightening
command, but it makes dd tell you how far it's along and does NOT kill it.
The 1M blocksize will help a lot. dd defaults to 512).

 

If you do skip the dd, the first verify will identify all the non-zero
sectors, and they'll sync up 100's of times faster than a full sync.

 

Good Luck!

 

Dan

 

From: Matt Baer [mailto:mb...@lrnet1.com] 
Sent: Friday, August 05, 2011 12:10 PM
To: Dan Barker
Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate

 

Roger that.  It's running, but they're 1.8TB a piece so it'll take a while.
Just wanted to let you know, no need for it to go to the list.  Thanks for
the help thus far, it's been difficult to deal with this and I have to get
it running ASAP.



On Fri, Aug 5, 2011 at 11:06 AM, Dan Barker dbar...@visioncomm.net wrote:

That's not the backing device. The backing device is something like
/dev/sdb. The drbd device is called device in your config. The backing
device is called disk.

 

It should not be mounted.

 

Dan

 

From: Matt Baer [mailto:mb...@lrnet1.com] 
Sent: Friday, August 05, 2011 11:44 AM


To: Dan Barker
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate

 

After I down all, it won't let me issue the dd command citing:

dd: opening `/dev/drbd0': Read-only file system

On Fri, Aug 5, 2011 at 10:37 AM, Dan Barker dbar...@visioncomm.net wrote:

We've been off-list for a few messages, but this is now interesting enough
to be public. I apologize for the top-posting. Please read backwards. Dan

 

Well, to be certain, I'd dd the disks to zeros individually, and then start
with them sync'd. 

 

drbdadm down all

dd if=/dev/zero of=/drbdbackingdevice bs=1M oflag=direct

 

on both sides.

 

New Blank Disk:

===

#On both nodes, initialize meta data and configure the device.

drbdadm -- --force create-md r0

 

#They need to do the initial handshake, so they know their sizes.

drbdadm up r0

 

#They are now Connected Secondary/Secondary Inconsistent/Inconsistent.
Generate a new current-uuid and clear the dirty bitmap.

drbdadm -- --clear-bitmap new-current-uuid r0

 

#They are now Connected Secondary/Secondary UpToDate/UpToDate.

drbdadm primary r0

 

Now, recreate your empty ext3 file system and you are in sync.

 

Dan

 

From: Matt Baer [mailto:mb...@lrnet1.com] 
Sent: Friday, August 05, 2011 11:26 AM
To: Dan Barker


Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate

 

Well we're the perfect supplements for each other because, as you can see, I
don't know DRBD from a hole in the wall.

Yes, it's a brand new resource.  I don't think I would say it's full of
zeros, as it has a clean ext3 file system on it.

Yes, I would LOVE to skip the sync, I've been dealing with this for weeks
now and right when I was about to go live, I tested the failover and it
didn't work because of a service heartbeat wanted to start wasn't going all
that well.  While troubleshooting, I lost my 100% perfectly live server and
have to start from scratch.  Problem is I only have two days to do it and
the thing has to sync 1.8TB at 12MB/s.  I have no idea where the bottleneck
could be.  Two servers, a cable connecting eth1 to eth1, both are
auto-negotiated at 1gbps on their own /30 subnet.  The only thing there
would be garbage NIC cards, possible, but not probable, or the cable, more
likely, but I've never had an issue with it until now.  Freshly constructed
servers, too.

I tried the drbdsetup /dev/drbd0 syncer -r 120M, been running like that for
about 5 minutes now and it hasn't changed at all.

On Fri, Aug 5, 2011 at 10:13 AM, Dan Barker dbar...@visioncomm.net wrote:

Is this a brand new resource? Why are you doing a full sync? If it's brand
new (full of zeros), you can skip the sync. Instructions upon request.

 

Btw, I don't know why you are getting 12% of your requested syncer rate. I'm
not a hot-shot linux performace analyzer, but there is a bottleneck
somewhere. I get 25M routinely here on GB nics. I have my Syncer set to 25M.
It drops to about 14M (each) if 2 are syncing.

 

To change sync rate without stop/start drbd: drbdsetup /dev/drbd1 syncer -r
120M

 

AL Extents seems a bit low. I use 1801 (big prime number that felt about
right).

 

Dan

 

 

 

 

From: Matt Baer [mailto:mb...@lrnet1.com] 
Sent: Friday, August 05, 2011 11:03 AM
To: Dan Barker


Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate

 

Ok, revised /etc/drbd.conf and restarted DRBD with the following

common { syncer { rate 100M; al-extents 257; } }


And I'm getting from /proc/drbd:

GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
buildsvn@c5-x8664-build, 2008-10-03 11:30:17
 0

Re: [DRBD-user] DRBD won't take 1G syncer rate

2011-08-05 Thread Dan Barker
Actually, that makes some sense. If the network is way faster than the sync,
and the dd (which doesn't even use the network) is the same speed, then
there is something bad wrong with the underlying device or configuration.

 

You are not going to be happy until you get that fixed. Then, the dd should
beat the sync by a good bit.

 

You can skip the dd and build your filesystem and use it RIGHT NOW, but
there is still something wrong and the performance will probably suck. But,
you can get some work done. 

 

If fixing the underlying problem doesn't wipe your data, then an online
verify and disconnect/connect (of the secondary node) will get you synced up
with zero downtime.

 

hth

 

Dan

 

From: Matt Baer [mailto:mb...@lrnet1.com] 
Sent: Friday, August 05, 2011 12:34 PM
To: Dan Barker
Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate

 

Ok, looking at this, my guess is that dd and the sync would take roughly the
same amount of time.  Actually, if I am to believe the output of the kill
command you included, using dd will actually take more time.  So I just take
your previous instructions and omit the dd command to skip it?

I don't care if the sync takes forever, I just want to be able to DO
something while it's syncing.



On Fri, Aug 5, 2011 at 11:15 AM, Dan Barker dbar...@visioncomm.net wrote:

OK, You can skip the zero, but the devices won't pass an online verify. It
won't hurt anything. All the sectors that need to be synced will be synced.
The disks are mostly zeros, so an online verify wouldn't do much, but it's
nice to know.

 

To find out how long the DD will take, kill -USR1 taskid (frightening
command, but it makes dd tell you how far it's along and does NOT kill it.
The 1M blocksize will help a lot. dd defaults to 512).

 

If you do skip the dd, the first verify will identify all the non-zero
sectors, and they'll sync up 100's of times faster than a full sync.

 

Good Luck!

 

Dan

 

From: Matt Baer [mailto:mb...@lrnet1.com] 
Sent: Friday, August 05, 2011 12:10 PM


To: Dan Barker
Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate

 

Roger that.  It's running, but they're 1.8TB a piece so it'll take a while.
Just wanted to let you know, no need for it to go to the list.  Thanks for
the help thus far, it's been difficult to deal with this and I have to get
it running ASAP.

On Fri, Aug 5, 2011 at 11:06 AM, Dan Barker dbar...@visioncomm.net wrote:

That's not the backing device. The backing device is something like
/dev/sdb. The drbd device is called device in your config. The backing
device is called disk.

 

It should not be mounted.

 

Dan

 

From: Matt Baer [mailto:mb...@lrnet1.com] 
Sent: Friday, August 05, 2011 11:44 AM


To: Dan Barker
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate

 

After I down all, it won't let me issue the dd command citing:

dd: opening `/dev/drbd0': Read-only file system

On Fri, Aug 5, 2011 at 10:37 AM, Dan Barker dbar...@visioncomm.net wrote:

We've been off-list for a few messages, but this is now interesting enough
to be public. I apologize for the top-posting. Please read backwards. Dan

 

Well, to be certain, I'd dd the disks to zeros individually, and then start
with them sync'd. 

 

drbdadm down all

dd if=/dev/zero of=/drbdbackingdevice bs=1M oflag=direct

 

on both sides.

 

New Blank Disk:

===

#On both nodes, initialize meta data and configure the device.

drbdadm -- --force create-md r0

 

#They need to do the initial handshake, so they know their sizes.

drbdadm up r0

 

#They are now Connected Secondary/Secondary Inconsistent/Inconsistent.
Generate a new current-uuid and clear the dirty bitmap.

drbdadm -- --clear-bitmap new-current-uuid r0

 

#They are now Connected Secondary/Secondary UpToDate/UpToDate.

drbdadm primary r0

 

Now, recreate your empty ext3 file system and you are in sync.

 

Dan

 

From: Matt Baer [mailto:mb...@lrnet1.com] 
Sent: Friday, August 05, 2011 11:26 AM
To: Dan Barker


Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate

 

Well we're the perfect supplements for each other because, as you can see, I
don't know DRBD from a hole in the wall.

Yes, it's a brand new resource.  I don't think I would say it's full of
zeros, as it has a clean ext3 file system on it.

Yes, I would LOVE to skip the sync, I've been dealing with this for weeks
now and right when I was about to go live, I tested the failover and it
didn't work because of a service heartbeat wanted to start wasn't going all
that well.  While troubleshooting, I lost my 100% perfectly live server and
have to start from scratch.  Problem is I only have two days to do it and
the thing has to sync 1.8TB at 12MB/s.  I have no idea where the bottleneck
could be.  Two servers, a cable connecting eth1 to eth1, both are
auto-negotiated at 1gbps on their own /30 subnet.  The only thing there
would be garbage NIC cards, possible, but not probable

Re: [DRBD-user] DRBD on XenServer

2011-07-31 Thread Dan Barker
You will only see the node resources on the primary node.

On the master node, stop all processes using the drbd resource (I'll call it
r0) and drbdadm secondary r0.
On the slave node, drbdadm primary r0. Now the resource is available on
the slave node.

In practice, you have to trust a status of Up-To-Date in most situations.
Accessing the slave node while secondary would most likely corrupt it
(unless it's cluster aware - a different issue all together). drbd prevents
that (accessing a secondary resource).

hth

Dan

-Original Message-
From: drbd-user-boun...@lists.linbit.com
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of waynecsh
Sent: Thursday, July 28, 2011 12:08 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] DRBD on XenServer


Hi,

I'm trying to setup DRBD on citrix Free Xenserver in master/slave mode.

I've managed to go to the stage where 'cat /proc/drbd' shows the two nodes
are synchronised in primary/secondary mode, and defined a local storage on
the primary node using the xe sr-create.

However, on the slave xenserver, I cannot see the new storage. Even 'lvs' 
'vgs' command do not show anything.

Can anyone advice me on how to confirm that the newly created storage is
being replicated to the slave node? Do I need to define the storage again on
the slave node?

Thank you.

Regards,
Wayne
-- 
View this message in context:
http://old.nabble.com/DRBD-on-XenServer-tp32153683p32153683.html
Sent from the DRBD - User mailing list archive at Nabble.com.

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD 8.0.13-HA 2.1.3 sync after physical move

2011-07-31 Thread Dan Barker
All you need to do (besides assure the network is working – not “spotty”) is 
connect the devices. Secondary Up-To-Date will be the sync-target; that’s what 
secondary means. You don’t need to do any invalidation. The only way to mess up 
your data would be to set the Zimbra2 node to primary and then mount/access the 
data. THAT would create a split-brain. As long as Zimbra1 is primary and 
Zimbra2 is secondary, connecting the nodes successfully will resync in the 
proper direction.

 

hth

 

Dan “Top Poster”

 

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Bruce Wolfe
Sent: Friday, July 29, 2011 5:50 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] DRBD 8.0.13-HA 2.1.3 sync after physical move

 

Hi! Could use a little help. I'm a newb to DRDB-HA but understand the concepts 
and am handy on the CLI. In fact, I found that the Secondary server has been 
'disconnected' for over a month. These servers run only Zimbra at the moment 
running on CentOS 5.5. I hope what I provide below is enough to get the ball 
rolling.  

Thank you in advance for any prompt help you may be able to offer today.


I inherited this system and after a physical move of the bare metal to a faster 
connection at data center, drbd-ha seems to not connect and sync. 

The internal LAN is working as I can ping the ports from each server to the 
other.  But, it is spotty to be able to telnet in. The peer server, Zimbra2, 
gets it here and there using 192.168.1.2 but the master, Zimbra1, I can never 
get to work using the standard port, 'telnet 192.168.1.1 7788'

Despite that, Zimbra1 is st:Primary/Unknown and Zimbra2 is 
st:Secondary/Unknown. 
Since Zimbra2 has been drbd-ha offline for some time, I was advised by other on 
IRC to discard the data before reconnecting. 

Logged into Zimbra2 server and after disconnecting Zimbra2 (Secondary) using 
'drbdadm disconnect repdata' I performed 'drbdadm -- --discard-my-data connect 
repdata' but /proc/drbd on Zimbra2 still results in UpToDate. 

I even tried stopping HA on Zimbra2 but that didn't make a difference either.
Any ideas? I want Zimbra1 (Primary) to populate Zimbra2 as it is the working 
server with the most recent data.

Now, /dev/drbd0 exists on Zimbra2 but is not mounted that I can see. In order 
to --discard-my-data, does the drive need to be mounted?

On Zimbra1 it is mounted like this:
/dev/drbd0 on /opt type ext3 (rw)
On Zimbra2 there is no record of it being mounted.


Here is /proc/drbd for Zimbra1:
@zimbra1 ~]# cat /proc/drbd
version: 8.0.13 (api:86/proto:86)
GIT-hash: ee3ad77563d2e87171a3da17cc002ddfd1677dbe build by 
buildsvn@c5-x8664-build, 2008-10-03 10:12:56
 0: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown   r---
ns:0 nr:0 dw:1505899068 dr:1246861951 al:3145701 bm:3145590 lo:0 pe:0 ua:0 
ap:0
resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/127 hits:353787267 misses:3888262 starving:1706 
dirty:741251 changed:3145701

Here is /proc/drbd for Zimbra2:
@zimbra2 ~]# cat /proc/drbd
version: 8.0.13 (api:86/proto:86)
GIT-hash: ee3ad77563d2e87171a3da17cc002ddfd1677dbe build by 
buildsvn@c5-x8664-build, 2008-10-03 10:12:56
 0: cs:StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown   r---
ns:0 nr:0 dw:912 dr:41418 al:18 bm:62 lo:0 pe:0 ua:0 ap:0
resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/127 hits:210 misses:18 starving:0 dirty:0 changed:18

Also, here is /etc/drbd.conf if that helps.
@zimbra2 ~]# cat /etc/drbd.conf
global { usage-count no; }
resource repdata {
  protocol C;
  startup { wfc-timeout 0; degr-wfc-timeout 120; }
  disk { on-io-error detach; }
  syncer { rate 25M; }
  on zimbra1.marininstitute.org {
device /dev/drbd0;
disk /dev/VolGroup01/LogVol01;
address 192.168.1.1:7788;
meta-disk internal;
  }
  on zimbra2.marininstitute.org {
device /dev/drbd0;
disk /dev/VolGroup00/LogVol00;
address 192.168.1.2:7788;
meta-disk internal;
  }
}

Thank you in advance for any prompt help you may be able to offer today.

Bruce M. Wolfe, M.S.W., CIO

Description: Image removed by sender.

 24 Belvedere St.
San Rafael, CA 94901
  415/456.5692 x213 main
 415/257.2493 office
   415/456.0491 fax
   KI6BSL  ham



He that falls in love with himself will have no rivals. - Benjamin Franklin

~WRD000.jpg___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


  1   2   >