Re: [DRBD-user] Mount and use disk while Inconsistent?
Good to know, thanks. Cheers! Dan On 9/21/2018 1:55 PM, digimer wrote: Yup, its fine. Note though; If the UpToDate node goes offline, the Inconsistent node will force itself to Secondary and be unusable. So while it's possible to mount and use, be careful that whatever is being used can handle having the storage ripped out from under it. digimer On 2018-09-21 01:52 PM, Dan Ragle wrote: Just double checking. Is it ok to have a dual-primary setup where both nodes are primary while one is still syncing? [node1]# drbdadm status r0 role:Primary volume:0 disk:UpToDate volume:1 disk:UpToDate node2.mydomain.com role:Primary volume:0 replication:SyncSource peer-disk:Inconsistent done:99.46 volume:1 peer-disk:UpToDate First time I've seen it in my testing. Nothing complained about it so I *think* it's ok ... ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] Mount and use disk while Inconsistent?
Just double checking. Is it ok to have a dual-primary setup where both nodes are primary while one is still syncing? [node1]# drbdadm status r0 role:Primary volume:0 disk:UpToDate volume:1 disk:UpToDate node2.mydomain.com role:Primary volume:0 replication:SyncSource peer-disk:Inconsistent done:99.46 volume:1 peer-disk:UpToDate First time I've seen it in my testing. Nothing complained about it so I *think* it's ok ... ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD9: full-mesh and managed resources
On Thu, Aug 18, 2016 at 6:03 AM, Veit Wahlichwrote: > But the shortest link is not guaranteed. Especially after recovery from > a network link failure. > You might want to monitor each node for the shortest path. Simplest solution here is to overbuild. If you are going to do a 3-node 'full-mesh' then you should consider 10G ethernet (a melanox w/ cables on ebay is about US$20 w/ cables!). Then you just enable STP on all the bridges and let it be. If you are taking 2 hops, that should still be well over the transfer rates you need for such a small cluster and STP will eventually work itself out. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbdmanage init with zfs storage?
On Tue, Aug 16, 2016 at 1:36 AM, Roland Kammererwrote: > The control volume itself is always on LVM and there is currently no way > to change that. It's 2 x 4M. I don't see a reason to change that. I'm running a ZFS so LVM for the cluster is on top of that. It's fine, but an extra layer and whatever the LVM guys don't like about /dev/zd* for pv's. I've found people using loop mounts to get around this, but I think that is double clumbsy, zfs zpool to losetup to lvm :/ > > You can put all your data volumes on ZFS (thin, thick, configurable > block size and what not). Documented in the drbd9 user guide. this is working fine, I configured the nodes to use zfs storage driver and that works perfectly. It's just the drbdpool volume that is convoluted and I was hoping for a more elegant solution. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] drbdmanage init with zfs storage?
Hello! First off, I'm new to drbd, or rather I haven't used it in a very very long time. I'm looking for a way to: a) init a new drbd cluster with drbdmanage backed by zfs (instead of lvm) or b) init a new drbd cluster with drbdmanage and then change the control volume over to zfs afterwards. I can't find documentation on either possibility for reference, this is what I'm trying to accomplish 'hyperconverged' proxmox 4.2 cluster, zfs as the system volume management with drbd syncing a zvol, running lvm on top of that zvol for proxmox live migration to work. I really want to stick with zfs because of it's checksums, and for some data (backups and ISO/templates) compression and deduplication. Thanks. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] Is this configuration a bad idea?
Hi, I have 2 servers running about 10 OpenVZ containers and 4 KVM VMs, using Proxmox as a frontend. Both servers have 1TB hdds, mirrored using zfs. I'm considering adding high availability to this setup. My idea: configure a third server as a backup and use DRBD to mirror the data from both servers over the network. Most guides I've seen on the Internet assume you're going to use full disks for DRBD mirrors. In this case, I would create a partition on each server (200-300 GB) and use DRBD to constantly mirror data on the third server. Is this setup possible? The third server would have 2 different DRBD partitions. What do you recommend running on top of the DRDB partitions? ext4? Thank you. Best regards, Dan ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd: refactor use of first_peer_device()
[ For some reason I was looking at old warnings and this showed up. Sorry for sending these a long time after the fact. - dan ] Hello Lars Ellenberg, This is a semi-automatic email about new static checker warnings. The patch 44a4d551846b: drbd: refactor use of first_peer_device() from Nov 22, 2013, leads to the following Smatch complaint: drivers/block/drbd/drbd_nl.c:688 drbd_set_role() error: we previously assumed 'peer_device' could be null (see line 560) drivers/block/drbd/drbd_nl.c 559 struct drbd_peer_device *const peer_device = first_peer_device(device); 560 struct drbd_connection *const connection = peer_device ? peer_device-connection : NULL; ^^^ Check. 561 const int max_tries = 4; 562 enum drbd_state_rv rv = SS_UNKNOWN_ERROR; 563 struct net_conf *nc; 564 int try = 0; 565 int forced = 0; 566 union drbd_state mask, val; 567 [ snip ] 684 685 if (device-state.conn = C_WF_REPORT_PARAMS) { 686 /* if this was forced, we should consider sync */ 687 if (forced) 688 drbd_send_uuids(peer_device); ^^^ Dereferenced inside the function. 689 drbd_send_current_state(peer_device); 690 } regards, dan carpenter ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Considerations for using bcache on top of DRBD
-Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user- boun...@lists.linbit.com] On Behalf Of Andrew Martin Sent: Tuesday, February 18, 2014 2:14 PM To: Berto Bermudez Cc: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Considerations for using bcache on top of DRBD - Original Message - From: Berto Bermudez be...@momar.com To: Andrew Martin amar...@xes-inc.com, drbd-user@lists.linbit.com Sent: Monday, February 17, 2014 2:06:33 PM Subject: RE: [DRBD-user] Considerations for using bcache on top of DRBD You didn't indicate where in your setup the SSDs would sit. I would suggest you look at also replicating the ssd contents using drbd and layering bcache on top of that, so that in the event of failover you don't have the penalty of cold caches. HDDs -- md/raid -- LVM -- DRBD -- bcache |-- ext4 SSD-- DRBD -- bcache | There was a talk about something similar, but using flashcache http://www.youtube.com/watch?v=l910kiEuHOM Berto Momar, Inc. Hi Berto, I was intending to keep the SSDs as the top layer, above DRBD. If I understand correctly, you're suggesting creating a separate DRBD device on top of the SSD and then passing that DRBD device to bcache as the cache device, thus propagating the cache changes between nodes? backing device: HDDs -- md/raid -- LVM -- DRBD -- bcache -- ext4 cache device: SSDs -- md/raid -- DRBD -- bcache Could this create a problem where during failover the cache device is held open and thus can't be promoted to primary on the other node? In writeback mode this would prevent the device from being usable until the cache device could be failed over successfully. In the Video, Florian discusses this. You put both the SSD and the HDD in the same resource (Requires version 8.4). In that way, the HDD and SSD remain linked, as intended. Putting them in separate resources, a la version 8.3, would be a mess as you note. Dan ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd Input/output error
-Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user- boun...@lists.linbit.com] On Behalf Of Piotr Kloc Sent: Friday, February 14, 2014 7:49 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] drbd Input/output error Hello ! I have problem with drbd primary [root@wirt ~]# cat /proc/drbd version: 8.3.13 (api:88/proto:86-96) GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted, 2012-10-09 12:47:51 1: cs:StandAlone ro:Primary/Unknown ds:Diskless/DUnknown r- ns:0 nr:72 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 [root@wirt ~]# and secondary [root@wirt2 ~]# cat /proc/drbd version: 8.3.13 (api:88/proto:86-96) GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted, 2012-10-09 12:47:51 1: cs:WFConnection ro:Secondary/Unknown ds:Diskless/DUnknown C r- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 [root@wirt2 ~]# on primary i got [root@wirt ~]# lvs /dev/drbd1: read failed after 0 of 4096 at 1892425662464: Input/output error /dev/drbd1: read failed after 0 of 4096 at 1892425728000: Input/output error /dev/drbd1: read failed after 0 of 4096 at 0: Input/output error /dev/drbd1: read failed after 0 of 4096 at 4096: Input/output error LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert root vg0 -wi-ao 50.00g swap vg0 -wi-ao 16.00g tmp vg0 -wi-ao 25.00g vm1 vg1 -wi-ao 1.17t vm2 vg1 -wi-ao 400.00g I have LVM on DRBD and DRBD is on the /dev/md2 RAID1 system No, drbd is not on anything: cs:StandAlone ro:Primary/Unknown ds:Diskless/DUnknown You have not shown the resource definition files, so we can only guess. What has been done after / filled up? I would think at least freeing some space and a reboot, but you don't say. device minor 1; disk /dev/md2; address IP:7788; meta-disk internal; There was temporary out of space on / partition and then we got this errors How we can fix this now ? Can I run fsck on /dev/drbd1 ? No, you can't do anything against a diskless resource. Piotr The I/O errors are expected - drbd's devices are not present or not available. LVM may have grabbed them if your filters are not set up correctly. Again, please provide more doc. Output from mount, your drbd resource definition files and your LVM filters. Dan ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd not working with high mtu?
-Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user- boun...@lists.linbit.com] On Behalf Of Harka Gyozo SA Sent: Saturday, February 01, 2014 1:32 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] drbd not working with high mtu? Hi! Anyone knows about an issue with mtu? I tried to increase it to max ( there are two nics with a crossover cable, so no switch, router etc.. ). I found that max mtu supported by my card's driver is 7200. ssh, ping (also with big packages), nfs works. drbd keeps saying:] block drbd1: [drbd1_worker/16675] sock_sendmsg time expired, ko = ... If I decrease the mtu to 6000, error message stops, and I can see the syncing in the /proc/drbd (While the mtu is high it's stalled). If I increase the mtu to 6200, kernel message returns, communication stalled. This is a bug? Or maybe I should increase buffer sizes in the config? [ version: 8.3.7 (api:88/proto:86-91) ] HARKA Győző --- ssh, ping, nfs all will fragment packets. Try ping with a 10K size and the don't fragment switch set. Lower the size until it works. ping -M do -s 6800 -c 1 host Dan ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Antwort: Re: proto c - corrupt files - directories missing
-Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user- boun...@lists.linbit.com] On Behalf Of Bauer, Stefan (IZLBW Extern) Sent: Wednesday, January 22, 2014 10:11 AM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Antwort: Re: proto c - corrupt files - directories missing How do you trigger the email? I don't see anything in the manpage for the action OOS blocks. I ionly see local-io-error and stuff like this. Stefan ___ My global_common.conf contains: out-of-sync /usr/lib/drbd/notify-out-of-sync.sh use...@comain.ltd; I did not look in the doc. Gotta run out. Dan ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] data mismatch when primary/secondary are both up2date
In my cluster(node1/node2) with drbd, the state in /proc/drbd is primary/secondary up2date/up2date, but when I change primary to node2 , the file that existed on node1 can not be found on node2. Then I do drbdadm verify drbd0 to verify and resync the data, node2's data returned to be OK. I am wondering how the problem occurs and how I can avoid it? BTW: I build a PV/VG/LV on drbd0, and drbd0 is built on a LV too. Is this the reason? Thanks. This sounds as if you are mixing drbd device accesses and backing device accesses, but insufficient details are given. Also, the verify succeeding seems to say something else may be going on. Please provide more information, such as your LVM setup, drbd config, file systems and the test cases. Dan in Atlanta ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Fwd: how to place secondary node in maintenance mode
Wow! Those are some old versions! Just “drbdadm disconnect all” on the Primary node before the maintenance and “drbdadm connect all” afterwards. All else is magic! Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Gabriel Sosa Sent: Tuesday, September 24, 2013 11:08 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] Fwd: how to place secondary node in maintenance mode We have a pretty standard setup MASTER --- SLAVE (STANDBY) both nodes are in sync without much issue now but during this week our DC will perform some maintenance tasks that might incur in some connectivity issues between both nodes. How can put the slave node in maintenance (or any other mode) in order to avoid any split brain situation? I've been reading [1] and [2] but I can't find a clear answer to this. should I just take down the service on the SLAVE node performing a *service drbd stop* or instead disable the resource using *drbdadm down resourcename* versions: MASTER - version: 8.0.16 (api:86/proto:86) SLAVE - version: 8.3.15 (api:88/proto:86-97) Thanks [1] http://www.drbd.org/users-guide/ch-admin.html [2] http://www.geekpeek.net/drbd-management-command-usage/ -- Gabriel Sosa Sometimes the questions are complicated and the answers are simple. -- Dr. Seuss ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Unable to sync new machine
dmesg should show why earth won't stay WFC. The cat /proc/drbd just shows the state, it doesn't show the why. vulcan won't connect because earth is not WFC. Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Hans Lammerts Sent: Saturday, September 14, 2013 8:48 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] Unable to sync new machine Hi there, I've been using drbd for approx. 2 years now, and never had any big problems. Recently, one of two machines in my cluster crashed, and I had to reinstall it completely. Now I seem to be unable to sync that second machine with the first one. The situation: I'm using drbd 8.4.0 Machine 1 is called earth, machine 2 is called vulcan. Earth is the survivor half of my cluster, and vulcan had to be rebuilt. The actions I've taken: After compiling drbd on vulcan, I copied the resource file from /etc/drbd.d from earth to vulcan in the same place. In this case, the resource file for mysql, which looks like this : resource mysql { protocol C; syncer { rate 4M; } startup { wfc-timeout 15; degr-wfc-timeout 60; } handlers { split-brain /usr/lib/drbd/notify-split-brain.sh j.lamme...@chello.nlmailto:j.lamme...@chello.nl; } net { cram-hmac-alg sha1; shared-secret xxx; verify-alg sha1; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } on vulcan { device /dev/drbd0; disk /dev/sda5; address 192.168.0.15:7788; meta-disk internal; } on earth { device /dev/drbd0; disk /dev/sda5; address 192.168.0.5:7788; meta-disk internal; } } Then, I created the device meta-data on vulcan: drbdadm create-md mysql After (re)starting drbd on both machines, the cat /proc/drbd shows this: earth: [root@earth ~]# cat /proc/drbd version: 8.4.0 (api:1/proto:86-100) GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@earth, 2013-09-07 17:35:35 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r- ns:0 nr:0 dw:19360172 dr:6497416 al:138 bm:50 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:216372 vulcan: [root@vulcan drbd.d]# cat /proc/drbd version: 8.4.0 (api:1/proto:86-100) GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@vulcan, 2013-09-12 16:25:17 0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown rs ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:4652876 So both Standalone, and Primary/Unknown vs. Secondary/Unknown. Having Googled for this situation and its solution, I tried the solution as described in the Linbit DRBD manual, but stopping short of 5.4. The only thing that happens (as far as I can see) is that earth briefly goes into the WFConnection state, and nothing else. I tried the following as well: On earth: drbdadm connect all On vulcan: drbdadm -- --discard-my-data connect all (or drbdadm connect -discard-my-data mysql, can't remember exactly) But this did not get the synching of the resource started as well. I'm out of ideas, and can't really find anything searching Google different from what I have already tried. So, please, if anyone can give me a clou on how to resolve this situation, preferably without losing any data on earth, I would be most grateful. Thanks, Hans ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] generation identifiers
See below: From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of FDS | Forensik Data Services Sent: Friday, September 06, 2013 9:06 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] generation identifiers Hi there, I am involved in the aftermath of a split brain event (drbd 8.3.8). log files have given us several entries for drbd_sync_handshakes: 1. Feb 28 07:10:04 san02 kernel: [ 1435.507176] block drbd1: self 384ACFA6DDE7F305:6EC706A11FB6AAEB:AD2868A6C65F20AA:D5016EA9D8E57175 bits:28397184 flags:0 2. Feb 28 07:10:04 san02 kernel: [ 1435.507472] block drbd0: self 8FFC8087A5A8ACDF:128A2EF7246101ED:00C719D52E8C2482:4EEC506B23021307 bits:225135 flags:0 3. Feb 28 07:11:36 san01 kernel: [4427691.562720] block drbd1: self 16DA13A883B368DA:6EC706A11FB6AAEA:AD2868A6C65F20AA:D5016EA9D8E57175 bits:3072 flags:0 4. Feb 28 07:11:36 san01 kernel: [4427691.562923] block drbd0: self 2DBF62FFC79BF9F4:128A2EF7246101EC:00C719D52E8C2483:4EEC506B23021307 bits:53722300 flags:0 5. Feb 28 07:17:46 san02 kernel: [ 1897.442652] block drbd0: self 8FFC8087A5A8ACDE:8A4BA5176FB14BAD:128A2EF7246101ED:00C719D52E8C2482 bits:21858 flags:0 6. Feb 28 07:17:46 san02 kernel: [ 1897.443239] block drbd1: self 384ACFA6DDE7F304:E9797096C25A015C:6EC706A11FB6AAEB:AD2868A6C65F20AA bits:24262845 flags:0 7. Feb 28 07:19:19 san01 kernel: [4428153.502129] block drbd0: self 8A4BA5176FB14BAC::00C719D52E8C2483:4EEC506B23021307 bits:49588595 flags:0 8. Feb 28 07:19:19 san01 kernel: [4428153.502363] block drbd1: self E9797096C25A015C::AD2868A6C65F20AA:D5016EA9D8E57175 bits:0 flags:0 The Cluster consists of 2 nodes (san01 und san02) with two block devices (drbd01 und drbd02) running in dual primary with OCFS2. Analyzing Generation Identifiers oft he first handshake: · Current UUIDs (1.-4.) differ due to split brainsituation. · BitMap UUIDs (1.-4.) are nearly identical. Only the last digit differs from A-D. First Historical UUIDs (1.-4.) with drbd1 are identical but differ for block drbd0. Again only the last digit (2 vs 3). · Second Historical UUID (1.-4.) for both block-Devices are identical meaning both devices have the same data set, or? Questions: · Reason and meaning for differing last digit UUIDs are unknown. No information in the doc. I don't know · How can I see when the data sets were sync for the last time? Careful analysis of the logs · Meaning of values for flags fe 0 or 2? I don't know · Is it possible to decide which resource (drbd0 or drbd1) was primary by comparing UUIDs? You had dual primary - they were both primary. You'll have to look at the data itself. For example if one file on one of the standalone drbd0 exists with a newer date, that file is likely more valid than the other file. Depending on how and why you had dual primary without proper fencing in the first place, it may be trivial to determine which to keep and which to discard-my-data. If they are dual primary solely to provide live migration - AND - no live migrations were taking place at the time of the split brain event, you are fairly safe using the resource (or disk image) to which a VM was pointed at the time of the event. If the file systems were shared by multiple hosts, you are stuck with file by file analysis - and there are no promises that the newest file will contain all the data. You may need to do a row by row analysis depending on your applications. Again, without knowing the sharing and application logic in detail about your specific systems, split-brain analysis is impossible. hth and good luck! (Get fencing working!, Get a current drbd installed!!) Dan Aprecciating your help. -- Mit freundlichen Grüßen Ing. Mag. Horst Greifeneder (fds) FDS | Forensik Data Services. Better secure than sorry! Schenkelbachweg 32. A - 4600 W e l s. tel. 07242. 777 15. fax. 07242. 777 16. mailto:off...@fds.at ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] csums-alg seems not working on my cluster....
That's a very difficult way to go about setting up internal metadata. Normally, just we create the metadata on the raw device (/dev/sdb) and then create the filesystem on the drbd device (/dev/sql_data1). No math! You did not appear to specify a syncer rate. I thought the default was much higher than 2040K, but that's the target for the sync operation. Why not set the synch rate up to some reasonable percentage (most all for initial sync, maybe 30% of your bandwidth thereafter) of the available bandwidth. You say low without defining it. The displays appear syncher rate constrained. Also, you don't have to do a full sync on initially empty disks. That's in the doc under clear-bitmap and/or new-current-uuid. You can modify the syncher rate while running, or in the config files and then adjust the resources. Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Lafaille Christophe Sent: Thursday, September 05, 2013 9:37 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] csums-alg seems not working on my cluster Hi All, I need to use very low bandwith network between 2 machines using drbd and I try using csums-alg/verify-alg. But I've same duration with or without csums-alg ! Execution with csums-alg: [root@sms246105 drbd.d]# cat /proc/drbd version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@rh63_build, 2013-01-10 09:57:53 1: cs:SyncTarget ro:Secondary/Secondary ds:Inconsistent/UpToDate C r- ns:0 nr:512 dw:512 dr:147968 al:0 bm:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:4904384 [] sync'ed: 3.0% (4788/4932)M finish: 0:15:17 speed: 5,332 (5,284) want: 2,040 K/sec Execution without csums-alg: [root@sms246105 drbd.d]# cat /proc/drbd version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@rh63_build, 2013-01-10 09:57:53 1: cs:SyncTarget ro:Secondary/Secondary ds:Inconsistent/UpToDate C r- ns:0 nr:53760 dw:53760 dr:0 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:4819904 [] sync'ed: 1.2% (4704/4756)M finish: 0:14:52 speed: 5,376 (5,376) want: 2,040 K/sec I don't know where is the problem... is csums-alg usable only in a more recent version of DRBD (like 8.4.3 or 8.4.4) ? I've built drbd packages from source, perhaps I need to specify an option in order to have csums-alg facility (I'll check for this) ? I've put csums-alg in net section and in some web pages I've found a syncer section with csums-alg (seems no more available in 8.4.x versions). == what's the good place ? On both machines, I do this sequence: # /etc/init.d/drbd stop # delete all partition on /dev/sdb and create a 5GB (for my tests, real size is around 300GB) partitions with fdisk # partprobe /dev/sdb # dd if=/dev/zero of=dev/sdb1 bs=4096 == to initialize disk content # mkfs.ext3 -j -m 0 -b 4096 /dev/sdb1 # PARTSIZE=`sfdisk -s /dev/sdb1 | xargs -i echo {} 1024 / 1024 / p | dc` # NEWSIZE=$[${PARTSIZE}-2] # resize2fs /dev/sdb1 ${NEWSIZE}G # e2fsck -f /dev/sdb1 # /etc/init.d/drbd start # /sbin/drbdadm create-md sqldata # /sbin/drbdadm up sqldata On one machine: # /sbin/drbdadm --force primary sqldata The file /etc/drbd.d/sqldata.res : resource sqldata { device /dev/drbd_sqldata minor 1; disk /dev/sdb1; meta-disk internal; on sms246104 { address 135.117.246.104:7788; } on sms246105 { address 135.117.246.105:7788; } } The file /etc/drbd.d/global_common.conf : global { usage-count yes; dialog-refresh 1; minor-count 5; } common { handlers { pri-on-incon-degr /usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b /proc/sysrq-trigger ; reboot -f; pri-lost-after-sb /usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b /proc/sysrq-trigger ; reboot -f; local-io-error /usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o /proc/sysrq-trigger ; halt -f; split-brain /usr/lib/drbd/notify-split-brain.sh root; } startup { wfc-timeout 15; } options { } disk { on-io-error detach; c-plan-ahead 20; c-fill-target 50k; c-min-rate 250k; c-max-rate 2M; } net { timeout 60; ping-int 6; after-sb-0pri discard-younger-primary; after-sb-1pri discard-secondary; after-sb-2pri call-pri-lost-after-sb; ping-timeout 60; protocol C; cram-hmac-alg sha1; shared-secret TestHA; csums-alg sha1; verify-alg sha1; } } Traces in /var/log/kern.log : Sep 5 13:03:16 sms246104 kernel: events: mcg drbd: 2 Sep 5 13:03:16 sms246104 kernel: drbd: initialized. Version: 8.4.2 (api:1/proto:86-101) Sep 5 13:03:16 sms246104 kernel: drbd: GIT-hash
Re: [DRBD-user] DRDB over Software RAID1 - Failure: (104) Can not open backing device
DRBD can't use /dev/md4 because it's in use. It has a mounted filesystem using the entire device. You have your resource stack out of order. You would share the drbd device created from the md4 device. However, /dev/md4 already has a file system so you must shrink it or use external metadata. I'd suggest you shrink it and use internal metadata. What you have: * There are various disk devices (not specified) * Upon which is run raid providing /dev/md4 * Which contains an ext2 filesystem (/srv) * Which is shared. What you want: * There are various disk devices (not specified) * Upon which is run raid providing /dev/md4 * Which is the backing device for DRBD * Which provides /dev/drbd1 * Which contains an ext2 filesystem (/srv) * Which is shared. Clear as mud? Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Martin Krammer, New Media Interactive Sent: Monday, August 19, 2013 4:16 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] DRDB over Software RAID1 - Failure: (104) Can not open backing device I have two webservers running debian 7 (stable) with software RAID1 and DRBD 8.3.11. On both servers there are the following shares, which should be connected: On stella: /dev/md4 919014380 204664 872126436 1% /srv On laura: /dev/md4 442143360 153168996 266514700 37% /srv Later, the data of laura should be syncronized on /dev/drdb1. The conf-file looks like: resource r1 { on stella { device/dev/drbd1; disk /dev/md4; address 192.168.1.1:7789; meta-disk /dev/sdb3[0]; } on laura { device/dev/drbd1; disk /dev/md4; address 192.168.1.2:7789; meta-disk /dev/sdb2[0]; } } If I try to attach... root@stella:/srv# drbdadm attach r1 --== Thank you for participating in the global usage survey ==-- The server's response is: node already registered 1: Failure: (104) Can not open backing device. Command 'drbdsetup 1 disk /dev/md4 /dev/sdb3 0 --set-defaults --create-device --on-io-error=detach' terminated with exit code 10 Could anybody help me please? Martin. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Sync of Nodes
-Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user- boun...@lists.linbit.com] On Behalf Of Walter Robert Ditzler Sent: Monday, August 19, 2013 5:44 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] Sync of Nodes Hi all, I just made a test this weekend onto my new 2 XEN hosts: - Debian Wheezy, Kernel 3.10.7, DRBD Tools 8.4.3, Node A=master, Node B=secondary, XEN 4.3.0 Guests on LVM *** root@srv-ldeb-xen001:/var/local/abbeoo/xen.d# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.4.3 (api:1/proto:86-101) srcversion: 19422058F8A2D4AC0C8EF09 m:res cs ro ds p mounted fstype 11:drbd_host11 Connected Primary/Secondary UpToDate/ UpToDate C *** What I did is to check if synchronization works. I tested as followed: On Node A: 1) Create a file and save it in Desktop in my XEN Guest System 2) xl destroy host11 (kill my XEN Guest System) 3) drbdadm secondary drbd_host11 Then I continue on Node B 1) drbdadm primary drbd_host11 2) xl create host11.cfg (Start my XEN Guest System) 3) No file on the desktop!!! It seem that the synchronization doesn't work. When I shutdown Node B and start Node A again, I see the file again on the desktop. Can anyone help me out here? Do I make any mistake while starting DRBD at boot? Thanks a lot, Walter My Nodes are linked and working *** root@srv-ldeb-xen001:/var/local/abbeoo/xen.d# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) srcversion: 19422058F8A2D4AC0C8EF09 11: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r- ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 *** *** root@srv-ldeb-xen002:/var/local/xen/xen.d# cat /proc/drbd version: 8.3.13 (api:88/proto:86-96) srcversion: ECB278A2285B40525B8362B 11: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 *** *** PING on Node A to Node B root@srv-ldeb-xen001:/var/local/abbeoo/xen.d# ping 10.255.255.2 PING 10.255.255.2 (10.255.255.2) 56(84) bytes of data. 64 bytes from 10.255.255.2: icmp_req=1 ttl=64 time=0.195 ms *** *** Ping on Node B to Node A root@srv-ldeb-xen002:/var/local/xen/xen.d# ping 10.255.255.1 PING 10.255.255.1 (10.255.255.1) 56(84) bytes of data. 64 bytes from 10.255.255.1: icmp_req=1 ttl=64 time=0.277 ms *** All the reads and writes counters are showing zeros. Are you sure Xen is using the drbd resources and not the underlying storage? Xen should be accessing /dev/drbd11. My guess is that it's looking at /dev/vgdata1/host11. Dan ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Upgrading from 8.3.11-2 to 8.3.15-2
No worries. The servers negotiate a common protocol. You can even jump to 8.4.whatever. Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Rick Cone Sent: Saturday, June 22, 2013 4:27 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] Upgrading from 8.3.11-2 to 8.3.15-2 Hello, I'm going to upgrade form 8.3.11-2 to 8.3.15-2. Just a routine question. Is there any issue with running one node as 8.3.11-2 and the other as 8.3.15-2 for a few hours (like maybe 12 hours)? I want to upgrade the secondary system and let it run for a while, and then failover and upgrade the primary late tonight, etc. Just curious if there would be issues with replication, etc., while in that mode. Thanks! Rick ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbdadm verify resource with cron
Roberto: You are not looking for a return code, and the oos counter in /proc/drbd might not go up for hours. What you want is to configure the Out Of Sync handler. Mine (cron to run weekly, one resource per night) says: out-of-sync /usr/lib/drbd/notify-out-of-sync.sh myem...@address.tldmailto:myem...@address.tld; When I get these email, I can remedy the situation; disconnect/connect to resync, replace a drive if it's bad, whatever. It only sends the email about once per year (not using RAID). Dan in Atlanta From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Roberto Fastec Sent: Thursday, June 20, 2013 6:56 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] drbdadm verify resource with cron Dear readers I have configured drbd with resource syncer Since everything is running fine, the command #drbdadm verify resource outputs nothing I'm wondering what the output could be if some re-sync is needed, this is meant to use it with cron. Thank you for hinting Roberto ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbdadm verify resource with cron
I think so, but it only happens rarely. Doesn't really matter. Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user- boun...@lists.linbit.com] On Behalf Of AZ 9901 Sent: Friday, June 21, 2013 10:53 AM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] drbdadm verify resource with cron 2013/6/21 Dan Barker dbar...@visioncomm.net: What you want is to configure the Out Of Sync handler. Mine (cron to run weekly, one resource per night) says: out-of-sync /usr/lib/drbd/notify-out-of-sync.sh myem...@address.tld; Do both nodes send the mail, or only one ? Thank you ! Ben ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user attachment: winmail.dat___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbdadm verify resource with cron
There is no such thing as in sync or out of sync if the resource is standalone. It simply is what it is. Dan From: andreas graeper [mailto:agrae...@googlemail.com] Sent: Friday, June 21, 2013 11:09 AM To: Dan Barker Subject: Re: [DRBD-user] drbdadm verify resource with cron hi, when a drdb-device is standalone + primary, what tells me oss 0 ? dangerous or is it just a difference to the state when it was connected last time ? thanks andreas 2013/6/21 Dan Barker dbar...@visioncomm.netmailto:dbar...@visioncomm.net Roberto: You are not looking for a return code, and the oos counter in /proc/drbd might not go up for hours. What you want is to configure the Out Of Sync handler. Mine (cron to run weekly, one resource per night) says: out-of-sync /usr/lib/drbd/notify-out-of-sync.sh myem...@address.tldmailto:myem...@address.tld; When I get these email, I can remedy the situation; disconnect/connect to resync, replace a drive if it's bad, whatever. It only sends the email about once per year (not using RAID). Dan in Atlanta From: drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Roberto Fastec Sent: Thursday, June 20, 2013 6:56 PM To: drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com Subject: [DRBD-user] drbdadm verify resource with cron Dear readers I have configured drbd with resource syncer Since everything is running fine, the command #drbdadm verify resource outputs nothing I'm wondering what the output could be if some re-sync is needed, this is meant to use it with cron. Thank you for hinting Roberto ___ drbd-user mailing list drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user attachment: winmail.dat___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Replication problems constants with DRBD 8.3.10
The suggestion is to replace the actual RealTek NIC with an Intel NIC or some other dependable brand, not to use different drivers on the hardware you have. Clear as mud? Dan (top poster) in Atlanta -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user- boun...@lists.linbit.com] On Behalf Of cesar Sent: Monday, June 17, 2013 10:34 AM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Replication problems constants with DRBD 8.3.10 Hi to all people Excuse me if I ask a question from rookie But tell me that: the in-kernel realtek drivers of the 2.6.32 kernel were known to cause troubles. I still have to replace them for optimum stability. try to install the current version of the driver from realtek ot switch to Intel-NICs So, I don't know how do it, and in this link http://packages.debian.org/squeeze/firmware-realtek say for download the drivers for Realtek and Debian, but neither know if these drivers are more modern that the pve kernel. I have a Kernel of RHEL 6 modified by the authors ans SO Debian squeeze And to make matters worse, in this country (Paraguay) is not easy to buy another brand of NICs. The model of NIC is RTL8168E and lshw shows firmware=rtl_nic/rtl8168e-2.fw and the link of debian (shown above) shows among other models: * Realtek RTL8111D-1/RTL8168D-1 firmware (rtl_nic/rtl8168d-1.fw) * Realtek RTL8111D-2/RTL8168D-2 firmware (rtl_nic/rtl8168d-2.fw) * Realtek RTL8168E-1 firmware (rtl_nic/rtl8168e-1.fw) * Realtek RTL8168E-2 firmware (rtl_nic/rtl8168e-2.fw) * Realtek RTL8168E-3 firmware (rtl_nic/rtl8168e-3.fw) Can anyone help me with this problem. Best regards Cesar -- View this message in context: http://drbd.10923.n7.nabble.com/Replication- problems-constants-with-DRBD-8-3-10-tp17896p17913.html Sent from the DRBD - User mailing list archive at Nabble.com. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd+mysql+innodb
rsync will not be able to synchronize from a failed disk, drbd already has done so. Dan in Atlanta From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Robinson, Eric Sent: Wednesday, June 12, 2013 6:20 PM To: Dirk Bonenkamp - ProActive Cc: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] drbd+mysql+innodb Hi Dirk - Thanks for the feedback. I do need some clarification, though. DRBD replicates disk block changes to a standby volume. If the primary node suddenly fails, the cluster manager promotes the standby node to primary and starts the MySQL service. Logically, this seems exactly the same as simply rsyncing the data to the new server and starting the MySQL service. Why would it work with DRBD but not with rsync? Thanks for your patience while I explore this. Note: we have over 500 separate MySQL database instances using MyISAM. I am totally not stoked about the idea of using 300% more disk space and gobs more memory. -- Eric Robinson From: Dirk Bonenkamp - ProActive [mailto:d...@proactive.nl] Sent: Wednesday, June 12, 2013 7:24 AM To: Robinson, Eric Cc: drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com Subject: Re: [DRBD-user] drbd+mysql+innodb Hi Eric, We did the same conversion about a year ago. We run MySQL with InnoDB on a DRDB back-end. There's alot of stuff that's different between MyISAM and InnoDB, but the DRBD thing is the same. What you say about backups is correct, but this hasn't anything to do with DRBD. DRDB will do fine, some other quick non-DRDB things: - MySQL tuning is (even more) essential with InnoDB. - InnoDB tables use (a lot) more diskspace than MyISAM, our disk usage was nearly 300% of MyISAM's usage for the same dataset. - If you want performance, you want to be able to load your dataset in memory. Kind regards, Dirk Op 12-6-2013 15:44, Robinson, Eric schreef: We have been a MyISAM shop forever but we are considering switching to innodb. There is scant information available on using innodb with drbd. Are there special considerations and pitfalls? I have been told that it is not possible to backup innodb by doing a simple rsync of the data directory to another server like we can do with myisam. If that is true, what does that say about using innodb with drbd, which does essentially the same thing? -- Eric Robinson Disclaimer - June 12, 2013 This email and any files transmitted with it are confidential and intended solely for 'drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com'. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physicians' Managed Care or Physician Select Management. Warning: Although Physicians' Managed Care or Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments. ___ drbd-user mailing list drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user -- [cid:image001.gif@01CE67AC.9C400440]http://www.proactive.nl T 023 - 5422299 www.proactive.nlhttp://www.proactive.nl [cid:image002.gif@01CE67AC.9C400440]http://nl.linkedin.com/in/dirkbonenkamp [My status]skype:dirkbonenkamp?call inline: image001.gifinline: image002.gifinline: image003.png___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] promoting one of new disks
Yes, you can assume all zeros on new disks and skip the first sync. You can even do it with dirty disks, as long as the file systems are new, but the first verify will be a deusy. If you'd said what version of DRBD you were running, I'd give you the link in the manual for the correct command. But there are different manual and command syntaxes for 8.3 vs 8.4. Look in the appropriate doc for clear-bitmap. Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of andreas graeper Sent: Wednesday, May 29, 2013 7:19 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] promoting one of new disks hi, pure drbd, no corosync/pacemaker (first attempt, i started to read about lvm,cluster,... some days ago) when drbd on top of lv (lvm-logical-volumn) and on both nodes the lv are formatted with mkfs.ext4, cause of error 40 i had to zero the start of lv. after 'create-md' and 'up' on both nodes (cs:Connected ro=Sec/Sec ds=Inconsistant/Inconsistant) now i tried drbdadm --force primary r0 but i got error 17 and found in www a solution drbdadm invalidate r0 now automatically sync starts (cs:SyncTarget ro=Sec/Sec ds=Incons/Uptodate) the peer is implicit declared uptodate this way ? i should have invalidated the peer if i want to set local node to be primary ? (in case there are actually data on the node that shall become primary) but the main question: can i avoid the sync process, if on new disks there are no data ? sync is done (cs:Connected ro=Sec/Sec ds=Uptodate/Uptodate) i read, when brbd with pacemaker, then (i guess the resource-manager forces this) only on primary i have rw-access, and on secondary i cannot even read the mirrored data. i found the data (lv) already mounted on primary. i created a small file in /mnt/lva (mountpoint declared in /etc/fstab) now i want to change roles (manual 6.5 basic manual failover). `mount` or `df -h` does not show the mounted device. umount /dev/vg_xxx/lv_aaa = not mounted umount /mnt/lva = not mounted umount /dev/mapper/vg_xxx-lv_aaa = not mounted i simply tried (on current primary) : drbdadm secondary r cd /mnt/lva # change on that small file /proc/drbd - ds:uptodate/uptodate on secondary : drbdadm primary r - ro:Primary/Secondary ds:uptodate/uptodate (roles changed) on new primary i cannot mount the volumn cause its busy on old primary i can still see the new file now i stopped drbd on old primary and tried to mount the lv but it cannot mounted as ext4 anymore, cause it is damaged by the call of dd ?! how this is done correct: brbd on top of lv with internal metadata ? ? i must not zero the head of the formatted lv ? i have to force writing metadata to the end of that lv thanks in advance andreas ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Dumb question
The only dumb questions are the unasked question, and the question asked a second timeg. If you have run a verify, then all the out-of-sync blocks are marked. cat /proc/drbd should show very large numbers of oos (Out Of Sync) blocks. Simply disconnect/reconnect the resource and it will resync the out of sync blocks in the proper direction. The disconnect/reconnect can be done on the Primary or Secondary, it doesn't matter. Forcing a sync will sync all blocks, but your verify has already determined which need to be updated. It will be faster, but still quite a lot of data, I'd imagine. You will want to do the disconnect/reconnect at a low-use time of day, but I'm not sure waiting for the weekend is a good idea. You have no reliable redundancy until the oos count is zero. Verify only marks oos blocks and produces a message - it doesn't change the status of the disks. hth Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Prater, James K. Sent: Monday, May 20, 2013 7:19 AM To: Lars Ellenberg; drbd-user@lists.linbit.com Subject: [DRBD-user] Dumb question Hello Lars, I have a real dumb question. I have created mirrors, between two peers (active/passive) but did not do the initial sync. I was going to wait until the weekend to do that. However I had forgotten that I had placed drbdadm verify all in my cron, and not it is verifying. It had marked the volumes UpToDate. My question are these volumes really UpToDate or will I have to run drbdadm primary --force, or actually invalidate one then force the synchronization? Thanks James ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Not able to test Automatic split brain recovery policies
-Original Message- From: Shailesh Vaidya [mailto:shailesh_vai...@persistent.co.in] Sent: Thursday, April 11, 2013 1:50 AM To: Digimer Cc: Dan Barker; drbd-user@lists.linbit.com Subject: RE: [DRBD-user] Not able to test Automatic split brain recovery policies Hi Digimer, Thanks for help and explanation. I will try it out fencing option. However, I would like to validate if what I am testing for split-brain is correct or not. Also what could be done for simple split-brain auto- recovery through configuration without fencing. There is no simple split-brain recovery. Split Brain only occurs after an error of some sort causing two different nodes to write to the same resource while disconnected. Anything other than manual recovery of files or blocks will lose data. In many cases, it's not even possible to determine what data is being lost or how to recover it. You just have to pick the lesser of two evils and move forward, honoring the writes to one node and discarding the writes done on the other. Most applications and file systems react poorly to having writes of theirs discarded. Any effort spent automating the recovery of a split-brain could better be spent identifying how your configuration created the split brain, usually dual primary without sufficient controls in place to prevent split brain in the first place. ymmv Dan Regards, Shailesh Vaidya -Original Message- From: Digimer [mailto:li...@alteeve.ca] Sent: Wednesday, April 10, 2013 11:17 PM To: Shailesh Vaidya Cc: Dan Barker; drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Not able to test Automatic split brain recovery policies I've not done fencing in DRBD alone, so I am unable to offer specific suggestions. I can speak to generally what you need though. You can set DRBD's fencing policy to 'resource-and-stonith'. What this does is tell DRBD When you lose your peer, block IO and call a fence against it. The fence action reaches out (usually via IPMI or managed PDU) and forces the peer offline. After that, the surviving node will proceed. This way, at no time will both nodes be operating in StandAlone and Primary. You will want to set a delay so that one of the nodes has a head start when trying to fence the other. This way, in your test, when the communication breaks but the nodes are still up, you remove the risk of both nodes being fenced. What this does is say when you want to fence node 1, wait 15 seconds before doing so. when you want to fence node 2, don't wait and immediately fence. Thus, when it's a break in communications, you can predict which node will win the fence. When a node really fails, it will obviously not try to fence, being dead, so the healthy node will always win the fence and then take over. How you actually fence the peer will depends on what options you have available. Then you need a script that will actually do the work of reaching out and killing the peer. As I mentioned, this is usually done via IPMI (or branded out of band interfaces like iLO, DRAC, RSA, etc) or by using managed PDUs, like the APC AP7900. To do this, you need to have a scrip that reads certain environment variables set by DRBD, executes the request and then returns an appropriate exit code based on success or failure. I wrote such a fence handler called rhcs_fence (based on obliterate-peer.sh) which handles fencing by passing the request up to rhcs. You should be able to fairly easily adapt it to work with your setup. https://github.com/digimer/rhcs_fence Hope this helps clarify things. digimer On 04/10/2013 01:22 PM, Shailesh Vaidya wrote: Hi Don, Yup 8.3.8 is quit old but need to work with it for now. I am not using fencing and neither pacemaker or RHCS What I observed is that after split-brain its getting disconnected and dropped connection. both became unknown to each other. I am not sure is this issue with my test procedure itself. Do I need to make any additional configuration. Thanks, Shailesh Vaidya. From: Digimer [li...@alteeve.ca] Sent: Wednesday, April 10, 2013 8:11 PM To: Shailesh Vaidya Cc: Dan Barker; drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Not able to test Automatic split brain recovery policies To your immediate problem; If you had configured fencing, drbd would not split-brain. Are you using pacemaker or RHCS? Secondly, 8.3.8 is very, very old. Upgrading to a newer 8.3.x version would be a good idea. Back to split-brain; DRBD declares a split-brain as soon as both nodes are StandAlone and Primary. To recover, you need to tell DRBD which node to consider good and then drop the changes on the peer and let the good node sync to the other node. On 04/10/2013 08:08 AM, Shailesh Vaidya wrote: I have followed same procedure (disable Ethernet card) etc and after that drbd status on both the nodes
Re: [DRBD-user] Not able to test Automatic split brain recovery policies
You don't show the status of the nodes, but I imagine you have two primary nodes. There is no handler specified for two primary nodes. Did you have two primary, disconnected nodes? It shouldn't be possible to create split brain without writing on both nodes. Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Shailesh Vaidya Sent: Wednesday, April 10, 2013 1:58 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] Not able to test Automatic split brain recovery policies Hello, I am using DRBD 8.3.8 I have configured Automatic split brain recovery policies as below in /etc/drbd.conf net { max-buffers 2048; ko-count 4; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; } My both machines are Virtual machines so not connected actual back-to-back connection. To reproduce split-brain, I am using below procedure, 1.On Primary disable Ethernet card from 'Virtual Machine properties' 2.Wait to Secondery to start switch over and again enable Ethernet card on Primary Log shows mw that split-brain is occurred , however its shows connection dropped. Apr 9 10:30:15 drbd1 kernel: block drbd0: uuid_compare()=100 by rule 90 Apr 9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 Apr 9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0) Apr 9 10:30:15 drbd1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection! Apr 9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 Apr 9 10:30:15 drbd1 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0) Apr 9 10:30:15 drbd1 kernel: block drbd0: conn( WFReportParams - Disconnecting ) Full DRBD conf file [root@drbd1 ~]# cat /etc/drbd.conf global { usage-count no; } resource r0 { protocol C; #incon-degr-cmd echo !DRBD! pri on incon-degr | wall ; sleep 60 ; halt -f; on drbd1 { device /dev/drbd0; disk /dev/sda3; address10.55.199.51:7789; meta-disk internal; } on drbd2 { device/dev/drbd0; disk /dev/sda3; address 10.55.199.52:7789; meta-disk internal; } disk { on-io-error detach; } net { max-buffers 2048; ko-count 4; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; } syncer { rate 25M; al-extents 257; # must be a prime number } startup { wfc-timeout 20; degr-wfc-timeout 120;# 2 minutes. } } [root@drbd1 ~]# vi /var/log/messages [root@drbd1 ~]# [root@drbd1 ~]# cat /etc/drbd.conf global { usage-count no; } resource r0 { protocol C; #incon-degr-cmd echo !DRBD! pri on incon-degr | wall ; sleep 60 ; halt -f; on drbd1 { device /dev/drbd0; disk /dev/sda3; address10.55.199.51:7789; meta-disk internal; } on drbd2 { device/dev/drbd0; disk /dev/sda3; address 10.55.199.52:7789; meta-disk internal; } disk { on-io-error detach; } net { max-buffers 2048; ko-count 4; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; } syncer { rate 25M; al-extents 257; # must be a prime number } startup { wfc-timeout 20; degr-wfc-timeout 120;# 2 minutes. } } [root@drbd1 ~]# Is this configuration issue or my testing procedure is not proper? Regards, Shailesh Vaidya DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] does drbd act like a loop device?
I have been using it this way for over a year now without issue. On 03/27/2013 03:40 AM, Maurits van de Lande wrote: Hello, I'm would like to use a volume cached with flashcache as a drbd backing device with drbd 8.3. In order for flashcache to work there should not be a loop device in the storage path. Like: /dev/sdb1 regular disk raid6 partition /dev/sdc1 SSD based raid 1 partition The I use flashcache_create -p back cachedev /dev/sdc1 /dev/sdb1 to create the cached disk Drbd will use /dev/mapper/cachedev Does drbd act like a loop device? Has anybody used drbd in this way? (This is a different setup as Florian Haas used with drbd 8.4) Best regards, Maurits van de Lande ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Uncatchable DRBD out-of-sync issue
-Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user- boun...@lists.linbit.com] On Behalf Of Stanislav German-Evtushenko Sent: Monday, March 25, 2013 10:13 AM To: Radu Radutiu Subject: Re: [DRBD-user] Uncatchable DRBD out-of-sync issue Thank you for suggestion. I could investigate if this is a swap region but it wouldn't help because: 1) I can check it for Linux VMs but I can't do the same for Windows ones (because swap file is in file system). 2) I can't do online migration even if only swap region is out of sync because it will make a VM unstable. On Mon, Mar 25, 2013 at 5:07 PM, Radu Radutiu rradu...@gmail.com wrote: I was asking if the out of sync blocks belong to a swap partition of one of the virtual machines. I see exactly the same problem with a setup more or less similar to your setup (my setup is an active-passive one, lvm on top dbrb, with kvm virtual machines using these logical volumes as storage). I seem to recall some older messages on drbd list stating that it might be OK to have oos blocks for the swap device. Best Regards, Radu You don't need Dual-Primary to live migrate - you need shared storage. Two completely different concepts. Now, the shared storage can be based on a DRBD Primary, and you can have it fail over to a DRBD Secondary, but dual primary is not going to do what you want and shared storage will. The storage you share may be virtualized, if you like. Dan ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Uncatchable DRBD out-of-sync issue
Stanislav, my system sends me an email when verify finds an out-of-sync condition. You can use the same handler if you like. In my global, handlers section: out-of-sync /usr/lib/drbd/notify-out-of-sync.sh myemailaddress; Are you resyncing after the error is detected (disconnect/connect the resource)? Dan, in Atlanta From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Stanislav German-Evtushenko Sent: Sunday, March 24, 2013 7:00 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] Uncatchable DRBD out-of-sync issue Dear all, I'm trying to catch the issue with out-of-sync and I've stuck so far. Can anybody give me a hint what can I check next? Configuration: - two nodes Dell PowerEdge R710 (both nodes of the same hadrware, same configuration) - drbd0 master-master (size is 900GiB) - direct connection (two 1Gbit/s ethernet adapters in bonding balance-rr) - data-integrity-alg is crc32c (it has been enabled for testing purposes) - LVM on top of DRBD (LVM volumes are used by virtual machines) Software: - DRBD module version: 8.3.13 - kernel: Linux 2.6.32-19-pve #1 SMP x86_64 GNU/Linux Problem: - Each time when I do online verification it founds some sectors are out of sync (not many usually, about 5-15 messages after verification is done) - In fact these sectors are not synced (checked with dd and md5sum) - data-integrity-alg doesn't cause any messages in logs since drbdadm is connected all and until verification process finds some sectors out of sync Questions: - How is that possible? - Why data-integrity-alg doesn't catch the problem? - How to fix? *** extracts from kernel log *** Mar 24 13:23:38 host1 kernel: block drbd0: conn( Connected - VerifyS ) Mar 24 13:23:38 host1 kernel: block drbd0: Starting Online Verify from sector 0 Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718996928, size=8 (sectors) Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718996984, size=8 (sectors) Mar 24 14:13:17 host1 kernel: block drbd0: Out of sync: start=718997224, size=8 (sectors) * *** check with dd and md5sum *** # dd iflag=direct if=/dev/drbd0 bs=512 skip=718997224 count=8 | md5sum host1: 669a5c2ba22fa931aac16cdd2f03e22a host2: ceeac3bd59178ee13f94ce283e3a4de3 *** drbdadm /dev/drbd0 show *** disk { size0s _is_default; # bytes on-io-error pass_on _is_default; fencing dont-care _is_default; max-bio-bvecs 0 _is_default; } net { timeout 60 _is_default; # 1/10 seconds max-epoch-size 2048 _is_default; max-buffers 2048 _is_default; unplug-watermark128 _is_default; connect-int 10 _is_default; # seconds ping-int10 _is_default; # seconds sndbuf-size 0 _is_default; # bytes rcvbuf-size 0 _is_default; # bytes ko-count0 _is_default; allow-two-primaries; cram-hmac-alg sha1; shared-secret XXX; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect _is_default; rr-conflict disconnect _is_default; ping-timeout5 _is_default; # 1/10 seconds data-integrity-alg crc32c; on-congestion block _is_default; congestion-fill 0s _is_default; # byte congestion-extents 127 _is_default; } syncer { rate153600k; # bytes/second after -1 _is_default; al-extents 127 _is_default; verify-alg md5; on-no-data-accessible io-error _is_default; c-plan-ahead0 _is_default; # 1/10 seconds c-delay-target 10 _is_default; # 1/10 seconds c-fill-target 0s _is_default; # bytes c-max-rate 102400k _is_default; # bytes/second c-min-rate 4096k _is_default; # bytes/second } protocol C; _this_host { device minor 0; disk/dev/sda3; meta-disk internal; address ipv4 172.23.10.1:7788http://172.23.10.1:7788; } _remote_host { address ipv4 172.23.10.2:7788http://172.23.10.2:7788; } # (89) unknown tag = (integer) 0 [len: 4] # Found unknown tags, you should update your # userland tools *** Best regards, Stanislav ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] DRBD log files
Q. Is there one central place in DRBD where the log files are setup? Stdout redirected from the screen to a log file? Where are all echo cmds going? Thanks, Dan ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] dump-md data
The dump could be quite large if the bit map has a lot of non-zeros in it. As it is, all 9M times 64 bits times 4096 bytes/bit covers your 2+ terabyte metadata and all are zero so it takes one line. 8923456 times 0x; Take down your secondary, run for a few days, and then dump the md to get a respectably large dumpg. [Don't] Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Olivier Le Cam Sent: Monday, March 11, 2013 2:48 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] dump-md data Hi - By making a dump-md of a several TB device I expected to get a relatively large file. I realize that it is actually the opposite: the dump does only contain a dozen lines, like following. # DRBD meta data dump # 2013-03-11 18:38:21 +0100 [1363023501] # nfs-2 drbdmeta 0 v08 /dev/vg1/storage internal dump-md # version v08; # md_size_sect 139512 # md_offset 2339289165824 # al_offset 2339289133056 # bm_offset 2339217739776 uuid { 0x7117E0379FF23460; 0x; 0x6273B7EE32734046; 0x6272B7EE32734047; flags 0x0091; } # al-extents 3389; la-size-sect 4568784648; bm-byte-per-bit 4096; device-uuid 0x64B2B985FCFD7314; la-peer-max-bio-size 131072; # bm-bytes 71387264; bm { # at 0kB 8923456 times 0x; } # bits-set 0; Is this normal or do I missed something? PS: the dump-md drbdadm command requested that I first drbdadm apply-al before being able to dump de meta-data. Thanks and best regards, -- Olivier ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd pacemaker scst/srp 2 node active/passive question
That's easy, I've been doing it for years, going back to ESXi 4.1 at least, maybe even to 4.0. I run ESXi 5.1 now. Set up both the servers in ESXi, Configuration, Storage adapters. Use static discovery, because you can list the targets whether they exist or not. When the primary goes down, the secondary will come up (if it's available) on ESXi without intervention. In my setup, the .46 drbd is secondary, and invisible to ESXi. .47 is primary and visible to ESXi. I run the following targets (you can do this with the GUI, but I get lazy): vmkiscsi-tool -S -a 172.30.0.46 iqn.2012-05.com.visioncomm.DrbdR:Storage03 vmhba39 vmkiscsi-tool -S -a 172.30.0.46 iqn.2012-06.com.visioncomm.DrbdR:Storage02 vmhba39 vmkiscsi-tool -S -a 172.30.0.46 iqn.2012-08.com.visioncomm.DrbdR:Storage01 vmhba39 vmkiscsi-tool -S -a 172.30.0.46 iqn.2012-08.com.visioncomm.DrbdR:Storage00 vmhba39 vmkiscsi-tool -S -a 172.30.0.47 iqn.2012-05.com.visioncomm.DrbdR:Storage03 vmhba39 vmkiscsi-tool -S -a 172.30.0.47 iqn.2012-06.com.visioncomm.DrbdR:Storage02 vmhba39 vmkiscsi-tool -S -a 172.30.0.47 iqn.2012-08.com.visioncomm.DrbdR:Storage01 vmhba39 vmkiscsi-tool -S -a 172.30.0.47 iqn.2012-08.com.visioncomm.DrbdR:Storage00 vmhba39 If both are primary, I see 4 targets, 8 paths. This neverg happens. Usually, I see 4 targets, 4 paths. I always do the switchover manually, so you might see slightly different results. My steps are: Go primary on the .46 server. Start the target (iscsi-target) software on the .46 server. Rescan on all ESXi. Stop the target software on the .47 server (ESXi fails over to the other path seamlessly at this point). Stop drbd on .47 and do whatever maintenance was necessary. To reverse: The same steps, but you can skip the scan if the ESXi have seen both targets since boot. One shows up as active and the other shows up as dead, but the VMs don't care. hth Dan in Atlanta -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Jason Thomas Sent: Thursday, February 28, 2013 9:50 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] drbd pacemaker scst/srp 2 node active/passive question First time posting to a mailing list hope I get this right. I have a 2 node DRBD backed SCST/SRP single target(ib_srpt) setup working great using pacemaker/corosync. I am using this for the data store for a mail server. Where I am running into an issue is the initiator's are running on vmware ESXi 4.1 hosts, when a fail over occurs on the target the vm host initiators go dead and you have to rescan to pick up the target via the new path causing the vm guest to go down until the new path is discovered. Hope that makes sense. What I see as the potential problem is lvm and scst are only active on the primary node thus the secondary node is un-discoverable by ESXi host until it fails over. I am not sure what the answer is but my thought process is I am trying to figure out if it is possible to have: 1. on the node1 (primary node) drbd(primary), lvm, scst with the target in read/write mode 2. on the node2 (secondary node) drbd(secondary), lvm, scst with the target in read mode and when the node1 fails over, node1 scst target goes ready only and node2 scst target would switch to read/write. What I am trying to achieve is the the vm host seeing the target and paths at all times. Hopefully there is an easier solution to this and that I am not making things more difficult. I have been researching this for weeks and at the point of frustration. Any guidance would be appreciated. Side note: I modified SCSTTarget RA to work with ib_srpt as it was not written for it originally and did not find another RA out there specifically for my setup. Thank you for any help you may be able to provide. Setup: Initiator machines vmware ESXi 4.1 Target machines 2 nodes running CentOS 2.6.32-279.19.1.el6.x86_64 DRBD: kmod-drbd84-8.4.2-1.el6_3.elrepo.x86_64 Pacemaker/Corosync: pacemaker-libs-1.1.7-6.el6.x86_64 pacemaker-cli-1.1.7-6.el6.x86_64 pacemaker-1.1.7-6.el6.x86_64 pacemaker-cluster-libs-1.1.7-6.el6.x86_64 corosync-1.4.1-7.el6_3.1.x86_64 corosynclib-1.4.1-7.el6_3.1.x86_64 SCST/SRPT: scst-tools-2.6.32-279.19.1.el6-2.2.1-1.ab.x86_64 kernel-module-scst-iscsi-2.6.32-279.19.1.el6-2.2.1-1.ab.x86_64 kernel-module-scst-core-2.6.32-279.19.1.el6-2.2.1-1.ab.x86_64 kernel-module-scst-srpt-2.6.32-279.19.1.el6-2.2.1-1.ab.x86_64 scst config: HANDLER vdisk_fileio { DEVICE disk00 { filename /dev/drbd-stor/mail-stor nv_cache 1 } } TARGET_DRIVER ib_srpt { TARGET 0002:c902:0020:2020 { enabled 1 cpu_mask ff rel_tgt_id 1 GROUP data { LUN 0 disk00 INITIATOR 0x8102c902002020210002c903000f2bf3 INITIATOR 0x8102c902002020220002c903000f2bf3
Re: [DRBD-user] Device is held open by someone
Seems this is a common problem!!! Can you point those of us experiencing Device is held open by someone to a DRBD resource (documentation) that gives an overview in this area for a better understanding of what may be going on? I will look at www.drbd.org for info. Thanks, Dan Phillips -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Andreas Kurz Sent: Thursday, February 28, 2013 5:07 AM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Device is held open by someone On 2013-02-26 13:04, Felipe Gutierrez wrote: Hi everyone, I am trying to do a failover system only with drbd. When my primary node get out of the network, the secondary node became primary and I mount the filesystem. secondary# drbdadm primary r7 secondary# mount /dev/drbd7 /mnt/drbd7/ Until that every thing is ok. At this time, my old primary node has to became the secondary and I have to discard my changes. primary# umount -l /mnt/drbd7 primary# drbdadm secondary r7 7: State change failed: (-12) Device is held open by someone Command 'drbdsetup 7 secondary' terminated with exit code 11 primary# drbdadm -- --discard-my-data connect r7 Does anyone have a hint? It's always worth checking device-mapper: dmsetup ls --tree -o inverted Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Thnaks in advance! Felipe -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com mailto:felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia* ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Device is held open by someone
Well, that's who's got it open. Task 7354, 27005 and 27174. See which you may be able to stop or kill. Dan From: Felipe Gutierrez [mailto:felipe.o.gutier...@gmail.com] Sent: Wednesday, February 27, 2013 11:46 AM To: Dan Barker Subject: Re: [DRBD-user] Device is held open by someone root@cloud15:/home/cloud15# lsof | grep drbd lsof: WARNING: can't stat() fuse.gvfs-fuse-daemon file system /home/cloud15/.gvfs Output information may be incomplete. drbd7_wor 7354root cwd DIR8,2 4096 2 / drbd7_wor 7354root rtd DIR8,2 4096 2 / drbd7_wor 7354root txt unknown /proc/7354/exe drbd7_rec 27005root cwd DIR8,2 4096 2 / drbd7_rec 27005root rtd DIR8,2 4096 2 / drbd7_rec 27005root txt unknown /proc/27005/exe drbd7_ase 27174root cwd DIR8,2 4096 2 / drbd7_ase 27174root rtd DIR8,2 4096 2 / drbd7_ase 27174root txt unknown /proc/27174/exe On Wed, Feb 27, 2013 at 1:28 PM, Dan Barker dbar...@visioncomm.netmailto:dbar...@visioncomm.net wrote: And what did lsof | grep drbd say? From: drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Felipe Gutierrez Sent: Wednesday, February 27, 2013 11:24 AM To: Prater, James K. Cc: drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Device is held open by someone Hi James, even stoping Xen I couldn't umount my file system and set drbdadm secondary. This is my output: root@cloud15:/home/cloud15# umount /mnt/drbd7/ umount: /mnt/drbd7: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) root@cloud15:/home/cloud15# drbd-overview 7:r7 StandAlone Primary/Unknown UpToDate/DUnknown r- /mnt/drbd7 ext3 23G 8.3G 14G 39% Any hint? Thanks On Wed, Feb 27, 2013 at 6:50 AM, Prater, James K. jpra...@draper.commailto:jpra...@draper.com wrote: a separate system just for XEN. You are probably having some kernel based conflicts that is blocking the release of the volume(s). From: Felipe Gutierrez [mailto:felipe.o.gutier...@gmail.commailto:felipe.o.gutier...@gmail.com] Sent: Tuesday, February 26, 2013 04:57 PM To: Arnold Krille arn...@arnoldarts.demailto:arn...@arnoldarts.de Cc: drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Device is held open by someone Hi Arnold, I will try to stop Xen. Talking about stonith/fencing I was working with Corosync+Pacemaker+Xen+DRBD but the pace maker configurations got failed when I put all components together. I mean, when I was with Corosync+Pacemaker+DRBD the fencing worked well! After I put Xen together the pacemaker configuration got failed. Now I am not using Corosyn+Pacemaker anymore :( Do you have some clue to me about this? Thanks in advance! Felipe On Tue, Feb 26, 2013 at 6:47 PM, Arnold Krille arn...@arnoldarts.demailto:arn...@arnoldarts.de wrote: On Tue, 26 Feb 2013 09:43:55 -0300 Felipe Gutierrez felipe.o.gutier...@gmail.commailto:felipe.o.gutier...@gmail.com wrote: No, it is not mount. it is why i did the option -l on umount primary# umount -l /mnt/drbd7 I was saving files on this partition with Xen hypervisor. If I test the same thing with out Xen, everything works fine. Well, then make xen stop when you have to switch-over the primary. Or at least make xen stop using that directory. Could be its still running vms from there, could be its only still looking at the dir because it 'could' run vms from there. If you or your cluster-manager want to fail-over the resource and that fails, its a case for stonith/fencing. Or a case for a manual reboot if you haven't configured fencing yet. I just have to know how to force to make it secondary. For this time I rebbot the machine and I get to put to secondary. But I have to simulate it with out rebooting. Have fun, Arnold ___ drbd-user mailing list drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user -- -- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.commailto:felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia -- -- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.commailto:felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia -- -- -- Felipe Oliveira Gutierrez -- felipe.o.gutier
Re: [DRBD-user] Device is held open by someone
We are getting these two errors after a manual failover (fairly easy to recreate): Feb 27 09:23:47 jamaica-a kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended (we know the max count is 20 now.) Feb 26 00:36:03 jamaica-a kernel: drbd0: State change failed: Device is held open by someone Dan Phillips From: Phillips, Dan Sent: Wednesday, February 27, 2013 2:07 PM To: Dan Barker; drbd List (drbd-user@lists.linbit.com) Cc: Phillips, Dan; Felipe Gutierrez Subject: RE: [DRBD-user] Device is held open by someone On our system: [root@jamaica-a ~]# lsof | grep drbd drbd0_wor 14557 root cwd DIR9,2 1024 2 / drbd0_wor 14557 root rtd DIR9,2 1024 2 / drbd0_wor 14557 root txt unknown /proc/14557/exe drbd0_rec 28332 root cwd DIR9,2 1024 2 / drbd0_rec 28332 root rtd DIR9,2 1024 2 / drbd0_rec 28332 root txt unknown /proc/28332/exe drbd0_ase 28333 root cwd DIR9,2 1024 2 / drbd0_ase 28333 root rtd DIR9,2 1024 2 / drbd0_ase 28333 root txt unknown /proc/28333/exe HERE are the processes that correspond to task IDs 14557, 28332, 28333 [root@jamaica-a ~]# ps -aux | grep 14557 Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.7.2.7/FAQ root 14557 0.0 0.0 0 0 ?S06:57 0:02 [drbd0_worker] root 30919 0.0 0.0 3852 600 pts/3S+ 13:54 0:00 grep 14557 [root@jamaica-a ~]# ps -aux | grep 28332 Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.7.2.7/FAQ root 28332 0.0 0.0 0 0 ?S07:04 0:01 [drbd0_receiver] root 31384 0.0 0.0 3848 588 pts/3S+ 13:54 0:00 grep 28332 [root@jamaica-a ~]# ps -aux | grep 28333 Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.7.2.7/FAQ root 28333 0.0 0.0 0 0 ?S07:04 0:02 [drbd0_asender] root 31451 0.0 0.0 3848 592 pts/3S+ 13:54 0:00 grep 28333 So drbd0_worker, drbd0_receiver and drbd0_asender have files open. After a failover, should these processes still have a device(s) held open? What does this tell us? Thanks, Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Dan Barker Sent: Wednesday, February 27, 2013 12:39 PM To: drbd List (drbd-user@lists.linbit.com) Subject: Re: [DRBD-user] Device is held open by someone Well, that's who's got it open. Task 7354, 27005 and 27174. See which you may be able to stop or kill. Dan From: Felipe Gutierrez [mailto:felipe.o.gutier...@gmail.com] Sent: Wednesday, February 27, 2013 11:46 AM To: Dan Barker Subject: Re: [DRBD-user] Device is held open by someone root@cloud15:/home/cloud15# lsof | grep drbd lsof: WARNING: can't stat() fuse.gvfs-fuse-daemon file system /home/cloud15/.gvfs Output information may be incomplete. drbd7_wor 7354root cwd DIR8,2 4096 2 / drbd7_wor 7354root rtd DIR8,2 4096 2 / drbd7_wor 7354root txt unknown /proc/7354/exe drbd7_rec 27005root cwd DIR8,2 4096 2 / drbd7_rec 27005root rtd DIR8,2 4096 2 / drbd7_rec 27005root txt unknown /proc/27005/exe drbd7_ase 27174root cwd DIR8,2 4096 2 / drbd7_ase 27174root rtd DIR8,2 4096 2 / drbd7_ase 27174root txt unknown /proc/27174/exe On Wed, Feb 27, 2013 at 1:28 PM, Dan Barker dbar...@visioncomm.netmailto:dbar...@visioncomm.net wrote: And what did lsof | grep drbd say? From: drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Felipe Gutierrez Sent: Wednesday, February 27, 2013 11:24 AM To: Prater, James K. Cc: drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Device is held open by someone Hi James, even stoping Xen I couldn't umount my file system and set drbdadm secondary. This is my output: root@cloud15:/home/cloud15# umount /mnt/drbd7/ umount: /mnt/drbd7: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) root@cloud15:/home/cloud15# drbd-overview 7:r7 StandAlone Primary/Unknown UpToDate/DUnknown r- /mnt/drbd7 ext3 23G 8.3G 14G 39% Any hint? Thanks On Wed, Feb
Re: [DRBD-user] Device is held open by someone
We have been working on same/similar Device is held open by someone issue for some time now. Occurs on fairly regular basis upon manual failover. [root@jamaica-a logs]# tail -f /var/log/messages Feb 8 04:59:35 jamaica-a kernel: ide: failed opcode was: unknown Feb 8 04:59:35 jamaica-a kernel: drbd0: State change failed: Device is held open by someone Feb 8 04:59:35 jamaica-a kernel: drbd0: state = { cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate r--- } Feb 8 04:59:35 jamaica-a kernel: drbd0: wanted = { cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate r--- } Dan Phillips From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Felipe Gutierrez Sent: Tuesday, February 26, 2013 7:50 AM To: Prater, James K. Cc: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Device is held open by someone Like this: # cat /etc/drbd.d/r7.res resource r7 { on cloud15 { device /dev/drbd7; disk /dev/vg_7/lv_7; address 192.168.188.15:7789http://192.168.188.15:7789; meta-disk internal; } on cloud16 { device /dev/drbd7; disk /dev/vg_7/lv_7; address 192.168.188.16:7789http://192.168.188.16:7789; meta-disk internal; } } On Tue, Feb 26, 2013 at 9:49 AM, Prater, James K. jpra...@draper.commailto:jpra...@draper.com wrote: How are the drbd volume(s) used? James From: drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.commailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Felipe Gutierrez Sent: Tuesday, February 26, 2013 7:05 AM To: drbd-user@lists.linbit.commailto:drbd-user@lists.linbit.com Subject: [DRBD-user] Device is held open by someone Hi everyone, I am trying to do a failover system only with drbd. When my primary node get out of the network, the secondary node became primary and I mount the filesystem. secondary# drbdadm primary r7 secondary# mount /dev/drbd7 /mnt/drbd7/ Until that every thing is ok. At this time, my old primary node has to became the secondary and I have to discard my changes. primary# umount -l /mnt/drbd7 primary# drbdadm secondary r7 7: State change failed: (-12) Device is held open by someone Command 'drbdsetup 7 secondary' terminated with exit code 11 primary# drbdadm -- --discard-my-data connect r7 Does anyone have a hint? Thnaks in advance! Felipe -- -- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.commailto:felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia -- -- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.commailto:felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Expanding a cluster
Justin: I would suggest: Swap ALL drives in one server with 2T drives, build the new RAID array, let that sync. You have a backup in the 1T drives you pulled. Ditto for the Primary. You miss the rebuild, you only do the sync. A rebuild reads EVERY sector, regardless of whether it's in use; just asking for a failure on that many drives - and - you want to do that 32 times! Please don't. The only exposure in doing all 16 drives at one time is that there is a single copy of any changes that take place after you disconnect the servers until the sync completes. If a catastrophe occurs during that period, you have the original 16 drives as a fall back. Another issue is you miss the opportunity to reorg into two, 8-drive arrays as Adam suggests. Hey, I bet all your current data will fit onto 8, 2T drives. You could do both at the same time. Disconnect, pull 16 1T, add 16 2T, build 2 arrays of 8 drives each, sync drbd to only one of them. Switch to the other server, repeat on the first, and then migrate at your leisure half of the load from the first 8-disk array to the second. Dan top poster in Atlanta -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Marcelo Pereira Sent: Thursday, January 31, 2013 3:32 PM To: Adam Goryachev; Justin Edmands Cc: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Expanding a cluster Hi Adam, I'm sorry but it wasn't supposed to be an off-topic. I have been checking all the phases of this process, and RAID is something I was checking as well. What I wanted to know was really the DRBD side, as I know that this expansion will affect the block numbers and so on. That is why I wanted to know if DRBD would handle it okay, and how! Thanks ALL, for the messages! I will check the version numbers and publish the results here. And I will RTM. Thanks again, --Marcelo On 1/31/13 12:27 PM, Adam Goryachev mailingli...@websitemanagers.com.au wrote: On 01/02/13 04:04, Justin Edmands wrote: I'm on the fence about the amount of time it will take to degrade and rebuild a RAID6 at 16 drives (x2 systems). Anyone against the idea of: Backup data friday night through saturday morning stop drbd and heartbeat on node2 replace all drives on node2 build raid 6 and match setup/sizes from node1 initialize metadata, etc. start drbd and heartbeat let it sync make node2 primary repeat steps for node1 In theory, the set of drives you pulled from the secondary are an extra backup you could put all those drives back in, and make that set the primary In some ways this might be a better solution, since you are then simply doing a single large read on the primary, and a large write on the secondary no raid rebuilds, except for the initial resync on the secondary (which you might be able to skip since you know you will write to every sector very soon when drbd does the sync). 1) Stop DRBD on secondary 2) Pull all drives on secondary 3) Add all drives on secondary and build new RAID6 array 4) Enable DRBD on secondary 5) sync from primary to secondary Danger of read errors on the primary during this sync, but I would guess this is better than doing 16 rebuild's Personally, I would try to set the primary read-only during the process (if an option) so that the spare set of drives is an exact match to the primary (ie, they don't get outdated). Depends on how much downtime can be scheduled Finally, I think you have a fairly high risk with 16 drives in a single RAID6, you might consider 2 sets of 8 drives in RAID6, and do a linear concat of the two sets (or raid0). That allows you to lose any 2 out of 8 drives, instead of only 2 out of 16. Also, chances of URE on just one of the remaining 14 drives after a 2 drive failure is not a good risk I would want. Though depends on capacity requirements if you can use another 2 drives to ensure you don't lose the data. Just my 0.02c worth At the end of the day, the direct answer to the original question was RTFM, it really is a very nice manual, and you didn't tell us what version of DRBD you use. The rest is really off-topic for this list, maybe discuss on the linux-raid list if you are interested. Regards, Adam On Thu, Jan 31, 2013 at 11:20 AM, Adam Goryachev mailingli...@websitemanagers.com.au mailto:mailingli...@websitemanagers.com.au wrote: On 01/02/13 02:58, Marcelo Pereira wrote: Hello Everyone, I'm about to perform an upgrade on my servers and I was wondering how to do that. Here is the scenario: Server A has 16x 1Tb hard drives, under RAID-6. Server B has 16x 1Tb hard drives, under RAID-6. And both are in sync, using DRBD. I though about replacing the hard drives for 2Tb units, one by one. So, on each run, I would: * Remove a 1Tb disk * Add a 2Tb disk * Wait for it to rebuild the RAID After replacing ALL disks, I would expand the RAID unit
Re: [DRBD-user] bad side-effect: bug in inactive config stops other resources
If those time stamps are believable, then the problem occurred before you started! I don't know of drbd accessing the config decks before a drbdadm command, but it looks like you have proved that it happens. You copied, it activated! Very strange result. I'll let the drbd authors chime in on what new files in the config directory can cause before you request something using them, but it looks like the way to do this is to do the edit elsewhere, or Save As (:w filename). Actually, you have pacemaker and heartbeat in the mix. I imagine they do watch directories for changes. It may not be drbd at all, but drbdadm commands requested by those tools. Thanks for the warning! Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Helmut Wollmersdorfer Sent: Wednesday, January 23, 2013 12:02 PM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] bad side-effect: bug in inactive config stops other resources Am 23.01.2013 um 16:39 schrieb Dan Barker: conflicting use of resource section 'drbd8_1' Looks like you forgot to edit the config sections properly in the vim ... step. Do the drbd10 decks still say drbd8? Look at the time-stamps: 1351 [2013-01-08 - 15:28:56] cd /etc/drbd.d # my usual way to configure new drbd-resources 1352 [2013-01-08 - 15:29:29] cp -a drbd8_1.res drbd10_1.res 15:29:29 1353 [2013-01-08 - 15:29:39] cp -a drbd8_2.res drbd10_2.res 15:29:39 Jan 8 15:29:31 xen11 lrmd: [2403]: info: RA output: (xen_drbd5_1:1:monitor:stderr) drbd.d/drbd8_1.res:1: conflicting use of resource section 'drbd8_1' ...#012drbd.d/drbd10_1.res:1: resource section 'drbd8_1' first used here. 15:29:31 [...] Jan 8 15:29:37 xen11 Xen[32084]: INFO: Xen domain www will be stopped (timeout: 26s) Jan 8 15:29:37 xen11 Xen[32089]: INFO: Xen domain mail4 will be stopped (timeout: 26s) Jan 8 15:29:37 xen11 Xen[32086]: INFO: Xen domain typo3 will be stopped (timeout: 26s) # -- 15:29:37 The resources stopped *before* the vim step, even 2 seconds before the 2nd cp, 8 seconds after the 1st cp. Helmut Wollmersdorfer ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Diagnosing a Failed Resource
However, I still have no idea what caused the failures. A split brain is caused by writing to both members while they are disconnected. What in your environment caused that to occur is probably lost in logs a week gone. But, if your procedures always allow only one node (primary) to write to a resource, even if it’s disconnected, then split-brain won’t occur. “nuke the whole thing” certainly worked. So would have following the doc to invalidate the secondary copy and then simply connect. There is an excellent chapter in the manual about split-brain. Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Eric Sent: Monday, January 21, 2013 5:08 PM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Diagnosing a Failed Resource I decided to nuke the whole thing and start over: On both nodes, I... snip However, I still have no idea what caused the failures. Ideas? Suggestions? Eric Pretorious Truckee, CA bigsnip ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Diagnosing a Failed Resource
The errors in connecting are logged. If you can't find them, attempt to connect a resource (drbdadm connect r1, for example) to create the errors again, and then look at the logs for the reason the connection was not established. The status will continue to show waiting for connection (WFC) but there will be a reason in the log files. If the logs are unclear, post the relevant portions back here and we'll help. Something like 'dmesg | grep drbd'. You may want to do the logs on both drbd servers. You can do the connect command on either. hth Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Eric Sent: Monday, January 21, 2013 1:24 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] Diagnosing a Failed Resource I've configured corosync+pacemaker to managee a simple two-resource DRBD cluster: san1:~ # crm configure show | cat - node san1 \ attributes standby=off node san2 \ attributes standby=off primitive p_DRBD-r0 ocf:linbit:drbd \ params drbd_resource=r0 \ op monitor interval=60s primitive p_DRBD-r1 ocf:linbit:drbd \ params drbd_resource=r1 \ op monitor interval=60s primitive p_IP-1_253 ocf:heartbeat:IPaddr2 \ params ip=192.168.1.253 cidr_netmask=24 \ op monitor interval=30s primitive p_IP-1_254 ocf:heartbeat:IPaddr2 \ params ip=192.168.1.254 cidr_netmask=24 \ op monitor interval=30s primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget \ params iqn=iqn.2012-11.com.example.san1:sda \ op monitor interval=10s primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit \ params target_iqn=iqn.2012-11.com.example.san1:sda lun=0 path=/dev/drbd0 \ op monitor interval=10s primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit \ params target_iqn=iqn.2012-11.com.example.san1:sda lun=1 path=/dev/drbd1 \ op monitor interval=10s primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit \ params target_iqn=iqn.2012-11.com.example.san1:sda lun=2 path=/dev/drbd2 \ op monitor interval=10s primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit \ params target_iqn=iqn.2012-11.com.example.san1:sda lun=3 path=/dev/drbd3 \ op monitor interval=10s primitive p_iSCSI-san2 ocf:heartbeat:iSCSITarget \ params iqn=iqn.2012-11.com.example.san2:sda \ op monitor interval=10s primitive p_iSCSI-san2_0 ocf:heartbeat:iSCSILogicalUnit \ params target_iqn=iqn.2012-11.com.example.san2:sda lun=0 path=/dev/drbd1000 \ op monitor interval=10s primitive p_iSCSI-san2_1 ocf:heartbeat:iSCSILogicalUnit \ params target_iqn=iqn.2012-11.com.example.san2:sda lun=1 path=/dev/drbd1001 \ op monitor interval=10s primitive p_iSCSI-san2_2 ocf:heartbeat:iSCSILogicalUnit \ params target_iqn=iqn.2012-11.com.example.san2:sda lun=2 path=/dev/drbd1002 \ op monitor interval=10s primitive p_iSCSI-san2_3 ocf:heartbeat:iSCSILogicalUnit \ params target_iqn=iqn.2012-11.com.example.san2:sda lun=3 path=/dev/drbd1003 \ op monitor interval=10s group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_IP-1_254 group g_iSCSI-san2 p_iSCSI-san2 p_iSCSI-san2_0 p_iSCSI-san2_1 p_iSCSI-san2_2 p_iSCSI-san2_3 p_IP-1_253 ms ms_DRBD-r0 p_DRBD-r0 \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true ms ms_DRBD-r1 p_DRBD-r1 \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true location l_iSCSI-san1_and_DRBD-r0 p_IP-1_254 10240: san1 location l_iSCSI-san2_and_DRBD-r1 p_IP-1_253 10240: san2 colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master colocation c_iSCSI_with_DRBD-r1 inf: g_iSCSI-san2 ms_DRBD-r1:Master order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start order o_DRBD-r1_before_iSCSI-san2 inf: ms_DRBD-r1:promote g_iSCSI-san2:start property $id=cib-bootstrap-options \ dc-version=1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore The cluster appears to be functioning correctly: san1:~ # crm_mon -1 Last updated: Sun Jan 20 22:20:17 2013 Last change: Sun Jan 20 21:59:15 2013 by root via crm_attribute on san1 Stack: openais Current DC: san1 - partition with quorum Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf 2 Nodes configured, 2 expected votes 16 Resources configured. Online: [ san1 san2 ] Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0] Masters: [ san1 ] Slaves: [ san2 ] Resource Group: g_iSCSI-san1 p_iSCSI-san1(ocf::heartbeat:iSCSITarget):Started san1 p_iSCSI-san1_0(ocf::heartbeat:iSCSILogicalUnit):Started san1 p_iSCSI-san1_1(ocf::heartbeat:iSCSILogicalUnit):Started san1 p_iSCSI-san1_2(ocf::heartbeat:iSCSILogicalUnit):Started san1 p_iSCSI-san1_3
Re: [DRBD-user] DRDB stalled and impossible restart, down...
I have no idea what the problem might be. But I have an idea to un-hang drbd. If you go on the primary node and disconnect the resource (drbdadm r1 disconnect), maybe the processes on the secondary will respond. Saves a boot. Are you certain about the reliability of the network layer between the drbd hosts? Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Abdelkarim Mateos Sanchez Sent: Sunday, January 13, 2013 2:18 AM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] DRDB stalled and impossible restart, down... Hi. Any reply for this question? I'm desolate. In this machine, every week I need reboot server because DRBD it's hung down. Example. cat /proc/drbd version: 8.3.13 (api:88/proto:86-96) GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted, 2012-10-09 12:47:51 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r- ns:0 nr:0 dw:0 dr:335534008 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 1: cs:VerifyS ro:Secondary/Primary ds:UpToDate/UpToDate C r- ns:0 nr:1309972 dw:1309972 dr:51448776 al:0 bm:88 lo:1 pe:136721 ua:2048 ap:0 ep:1 wo:b oos:9459536 [] verified: 4.4% (48996/51196)M finish: 16317:48:56 speed: 0 (0) want: 40,960 K/sec (stalled) 2: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r- ns:0 nr:0 dw:0 dr:209708728 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:28051588 3: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r- ns:0 nr:0 dw:0 dr:209708728 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:90437120 root@pro01:~# /sbin/drbdadm verify all Command '/sbin/drbdsetup 0 verify' did not terminate within 5 seconds root@pro01:~# root@pro01:~# No response from the DRBD driver! Is the module loaded? I like shutdown drbd, can't do it. I like detach r1, can't do it.. Desolate. El 11/01/2013, a las 10:36, Abdelkarim Mateos Sanchez abk...@tamainut.com escribió: Hi. I'm desolate. With DRBD 8.3 (latest minor version) on Proxmox 2.2 r1.res stalled at /proc/drbd version: 8.3.13 (api:88/proto:86-96) GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted, 2012-10-09 12:47:51 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r- ns:0 nr:0 dw:44628 dr:335534008 al:0 bm:39 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 1: cs:VerifyS ro:Secondary/Primary ds:UpToDate/UpToDate C r- ns:0 nr:52427164 dw:52427164 dr:3246072 al:0 bm:3200 lo:1 pe:145893 ua:2048 ap:0 ep:1 wo:b oos:1309972 [...] verified: 6.2% (48036/51196)M finish: 755:24:58 speed: 16 (96) want: 40,960 K/sec (stalled) 2: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r- ns:0 nr:23024700 dw:127879064 dr:104854364 al:0 bm:8866 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 3: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r- ns:0 nr:79852364 dw:184706728 dr:104854364 al:0 bm:11415 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 I like disconnect, down resource, any solution for this situation. Bat all get a timeout cat /var/lock/drbd-147-1 lock on /var/lock/drbd-147-1 currently held by pid:591161 State change failed: (0)unknown error. change failed: (0)unknown error. service drbd restart Stopping all DRBD resources: No response from the DRBD driver! Is the module loaded? No response from the DRBD driver! Is the module loaded? But--- lsmod | grep drbd drbd 342496 13 Dec 31 17:52:31 pro01 kernel: block drbd1: [drbd1_worker/20189] sock_sendmsg time expired, ko = 4294961767 Dec 31 17:52:37 pro01 kernel: block drbd1: [drbd1_worker/20189] sock_sendmsg time expired, ko = 4294961766 Dec 31 17:52:43 pro01 kernel: block drbd1: [drbd1_worker/20189] sock_sendmsg time expired, ko = 4294961765 Dec 31 17:52:49 pro01 kernel: block drbd1: [drbd1_worker/20189] sock_sendmsg time expired, ko = 4294961764 Dec 31 17:52:55 pro01 kernel: block drbd1: [drbd1_worker/20189] sock_sendmsg time expired, ko = 4294961763 Dec 31 17:53:01 pro01 kernel: block drbd1: [drbd1_worker/20189] sock_sendmsg time expired, ko = 4294961762 Try kill process, not work ps aux |grep drbd1 root 20189 0.0 0.0 0 0 ?SDec28 0:17 [drbd1_worker] root 20207 0.0 0.0 0 0 ?SDec28 3:21 [drbd1_receiver] root 20213 0.0 0.0 0 0 ?SDec28 0:16 [drbd1_asender] Apreciate help Abdelkarim Mateos Sánchez CEO Tamainout Hébergement, S.A.R.L. (Marruecos) CET Tamainut IT, S.L. (España) Contacto | abk...@tamainut.com | Skype - mamateos Teléfono Fijo España: +34.851000209 | Marruecos Móvil: +212.671819412 islaserver.com | tamainut.tel Este mensaje se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la
Re: [DRBD-user] Too many block drbdX: Out of sync
the execution of /sbin/drbdadm verify all errors are corrected. This is incorrect. Verifyall errors are identified. They are not corrected. To correct them, disconnect and reconnect. They are corrected at connect time. Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Abdelkarim Mateos Sanchez Sent: Friday, January 11, 2013 4:36 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] Too many block drbdX: Out of sync Hi I use DRBD Proxmox VE 8.3 with 2.2 and a configuration Primary / Secondary on network computers trajeta Gigabyte Basically I use it to keep a copy of the LV of the primary machine. When I run, /sbin/drbdadm verify all I find that there are many messages like this:: Dec 31 09:18:04 pro01 kernel: block drbd2: Out of sync: start=147170512, size=248 (sectors) Dec 31 09:18:04 pro01 kernel: block drbd2: Out of sync: start=147195176, size=2504 (sectors) Dec 31 09:18:05 pro01 kernel: block drbd2: Out of sync: start=147218992, size=11472 (sectors) Dec 31 09:18:05 pro01 kernel: block drbd2: Out of sync: start=147230472, size=104 (sectors) Dec 31 09:18:05 pro01 kernel: block drbd2: Out of sync: start=147231992, size=1200 (sectors) Dec 31 09:18:05 pro01 kernel: block drbd2: Out of sync: start=147233280, size=96 (sectors) Dec 31 09:18:05 pro01 kernel: block drbd2: Out of sync: start=147233408, size=16 (sectors) Dec 31 09:18:05 pro01 kernel: block drbd2: Out of sync: start=147233560, size=32 (sectors) Dec 31 09:18:07 pro01 kernel: block drbd1: [drbd1_worker/20189] sock_sendmsg time expired, ko = 4294966911 Dec 31 09:18:09 pro01 kernel: block drbd2: Out of sync: start=147493960, size=8 (sectors) Use in Global Configuration syncer { # rate after al-extents use-rle cpu-mask verify-alg csums-alg verify-alg sha1; rate 40M; } In each resource resource r1 { # kvm420 protocol C; startup { wfc-timeout 15; # non-zero wfc-timeout can be dangerous (http://forum.proxmox.com/threads/3465-Is-it-safe-to-use-wfc-timeout-in-DRBD-configuration) degr-wfc-timeout 60; } net { cram-hmac-alg sha1; shared-secret CDwCYfY7420s; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } on pro01 { device /dev/drbd1; disk /dev/sata/vm-420-disk-1; address XXX.XXX.XXX.XXX:7789; meta-disk internal; } on pro02 { device /dev/drbd1; disk /dev/pve/vm-420-disk-1; address XXX.XXX.XXX.XXX:7789; meta-disk internal; } } Nor do I understand correctly, if the execution of /sbin/drbdadm verify all errors are corrected. Help is appreciated, as I am somewhat new to the suod and DRBD and I apologize for my English of Google translator. A happy end of year... resource r1 { # kvm420 protocol C; startup { wfc-timeout 15; # non-zero wfc-timeout can be dangerous (http://forum.proxmox.com/threads/3465-Is-it-safe-to-use-wfc-timeout-in-DRBD-configuration) degr-wfc-timeout 60; } net { cram-hmac-alg sha1; shared-secret CDwCYfY7420s; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } on pro01 { device /dev/drbd1; disk /dev/sata/vm-420-disk-1; address XXX.XXX.XXX.XXX:7789; meta-disk internal; } on pro02 { device /dev/drbd1; disk /dev/pve/vm-420-disk-1; address XXX.XXX.XXX.XXX:7789; meta-disk internal; } } Abdelkarim Mateos Sánchez CEO Tamainout Hébergement, S.A.R.L. (Marruecos) CET Tamainut IT, S.L. (España) Contacto | abk...@tamainut.com | Skype - mamateos Teléfono Fijo España: +34.851000209 | Marruecos Móvil: +212.671819412 islaserver.com | tamainut.tel Este mensaje se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilización, divulgación y/o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente por esta misma vía y proceda a su destrucción. This message is intended exclusively for its addresse and may contain information that is CONFIDENTIAL and protected by professional privilege. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received in error, please immediately notify us via e-mail and delete it. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] drbd0: State change failed: Device is held open by someone
Problem: The problem is that when performing an HA failover from server A to server B, a DRBD resource is sometimes not shut down properly on server A. Several attempts are made to stop the DRBD resource, but finally it gives up and the server is rebooted. The failover to server B works properly; B becomes the Active server. After the reboot, server A comes up properly as the Standby server. The problem is intermittent. Most HA failovers work as expected (server A does not reboot). When the problem does occur, the following lines are logged in /var/log/messages and displayed on the OOBM: drbd0: State change failed: Device is held open by someone drbd0: state = { cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate r--- } drbd0: wanted = { cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate r--- } Heartbeat: 2.1.4 Drbd: 8.0.11 kernel-module-drbd: 2.6.18 lvm2: 2.02.42-5 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] very large out-of-sync (oos) value yet drbd-overview claims UpToDate/UpToDate
There is an on-error event handler. Mine sends me email if verify fails (runs weekly, one resource each of M, Tu, W, Th nights). Dan In my Global handlers section: out-of-sync /usr/lib/drbd/notify-out-of-sync.sh myemail; -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Lonni J Friedman Sent: Wednesday, October 31, 2012 6:02 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] very large out-of-sync (oos) value yet drbd-overview claims UpToDate/UpToDate I've got a drbd setup with 8.3.11. I ran a manual verify, and once it completed it reported: [23479.620066] block drbd0: Online verify done (total 23136 sec; paused 0 sec; 73748 K/sec) [23479.702176] block drbd0: Online verify found 9651098 4k block out of sync! [23479.745988] block drbd0: conn( VerifyT - Connected ) [23479.788996] block drbd0: helper command: /sbin/drbdadm out-of-sync minor-0 [23479.839348] block drbd0: helper command: /sbin/drbdadm out-of-sync minor-0 exit code 0 (0x0) [23479.961245] block drbd0: bitmap WRITE of 2763 pages took 34 jiffies [23480.006527] block drbd0: 37 GB (9651098 bits) marked out-of-sync by on disk bit-map. This isn't entirely surprising, as the secondary node was down for a long time due to hardware problems. However, what is surprising is that drbd-overview still reports that everything is UpToDate: $ drbd-overview 0:sdb Connected Secondary/Primary UpToDate/UpToDate C r- Shouldn't this huge number of out of sync bits cause drbd-overview to report something other than UpToDate for the Secondary node? If not, then how does one actually programattically detect that a verification has failed? Parsing dmesg is going to be a huge kludge, and not likely to be reliable. thanks ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] very large out-of-sync (oos) value yet drbd-overview claims UpToDate/UpToDate
I don't know anything about drbd-overview, I just cat /proc/drbd. But, I bet it's echoing the same information. drbd keeps all the bytes in sync that it knows about (UpToDate). The changes it doesn't know about are found by verify. Disconnect/Connect syncs them back up. If you start with dirty disks, set up drbd and do not sync them, and mkfs a file system on the primary, the disks will be absolutely UpToDate in the blocks that matter for the file system, and horribly out of sync in the blocks that don't matter to anybody at all. Verify will find the oos blocks and mark them for syncing, but the hypothetical file system is still consistent. Just do the Disconnect/Connect and you'll have oos zero AND UpToDate. Dan -Original Message- From: Lonni J Friedman [mailto:netll...@gmail.com] Sent: Thursday, November 01, 2012 4:31 PM To: Dan Barker Cc: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] very large out-of-sync (oos) value yet drbd-overview claims UpToDate/UpToDate Thanks, that answers my 2nd question, but not my 1st question. Shouldn't drbd-overview be treating this as a not UpToDate scenario? On Thu, Nov 1, 2012 at 6:08 AM, Dan Barker dbar...@visioncomm.net wrote: There is an on-error event handler. Mine sends me email if verify fails (runs weekly, one resource each of M, Tu, W, Th nights). Dan In my Global handlers section: out-of-sync /usr/lib/drbd/notify-out-of-sync.sh myemail; -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Lonni J Friedman Sent: Wednesday, October 31, 2012 6:02 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] very large out-of-sync (oos) value yet drbd-overview claims UpToDate/UpToDate I've got a drbd setup with 8.3.11. I ran a manual verify, and once it completed it reported: [23479.620066] block drbd0: Online verify done (total 23136 sec; paused 0 sec; 73748 K/sec) [23479.702176] block drbd0: Online verify found 9651098 4k block out of sync! [23479.745988] block drbd0: conn( VerifyT - Connected ) [23479.788996] block drbd0: helper command: /sbin/drbdadm out-of-sync minor-0 [23479.839348] block drbd0: helper command: /sbin/drbdadm out-of-sync minor-0 exit code 0 (0x0) [23479.961245] block drbd0: bitmap WRITE of 2763 pages took 34 jiffies [23480.006527] block drbd0: 37 GB (9651098 bits) marked out-of-sync by on disk bit-map. This isn't entirely surprising, as the secondary node was down for a long time due to hardware problems. However, what is surprising is that drbd-overview still reports that everything is UpToDate: $ drbd-overview 0:sdb Connected Secondary/Primary UpToDate/UpToDate C r- Shouldn't this huge number of out of sync bits cause drbd-overview to report something other than UpToDate for the Secondary node? If not, then how does one actually programattically detect that a verification has failed? Parsing dmesg is going to be a huge kludge, and not likely to be reliable. thanks ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD Versions
I don't understand the hubbub over compiling this thing. My first DRBD was on Linux from scratch, a distribution where everything is done by hand, so there was no package manager availaable. I found the install quite simple. My most recent upgrade consisted of these easy steps, and took about 2 minutes. Selecting the kernel parameters for your environment (step 5) might take a little more time, but not counting backups, I'd say 10 minutes ought to do it. There are some prereqs, but they are probably on your system already: make gcc libc6 flex linux-headers-`uname -r` libc6-dev libssl-dev. 1. cd /usr/src/ 2. wget http://oss.linbit.com/drbd/8.4/drbd-8.4.2.tar.gz 3. tar -xzf drbd-8.4.2.tar.gz 4. cd drbd-8.4.2/ 5. ./configure --with-km --prefix /usr --sysconfdir /etc --localstatedir /var 6. make clean all 7. make install I hope seeing it laid out in all its simplicity encourages you to give it a try. Heck, fire up a virtual machine or two and experiment. That's the fun part of our jobs anyhow. Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Adam Goryachev Sent: Tuesday, October 23, 2012 6:51 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] DRBD Versions Hi, I've been using DRBD on a couple of systems for a while, and have just used the version that came with my distro (debian stable (squeeze)) since I really don't like to maintain compiling and installing from source, and managing (remembering) to upgrade, recompile, etc each time there is a new version. However, more and more, it would seem that my current 8.3.7 (debian package 2:8.3.7-2.1) is probably missing a lot of bug fixes, but on checking, debian testing only has 8.3.13, and even debian unstable has only 8.3.13. So the question is, should I just bite the bullet and install DRBD from source? I notice from http://www.drbd.org/download/packages/ that DRBD is integrated into the vanilla kernel 2.6.33 or newer. If I upgrade my debian stable kernel (2.6.32) to a newer version (either debian testing or debian-backports) 3.2 based kernel, can I then just download, compile, and install the latest 8.4.x version of DRBD? Thank you for your suggestions/comments Regards, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] IO Error Logging
My system logs them with timestamps. They just happen to be relative to boot time. I dmtime say dmtime 12345 and see the real time stamp. cat /usr/local/bin/dmtime EOF date --date=@$((($(date --date=$(ls -ld --time-style=+%Y-%m-%d %H:%M /proc/1|awk '{print $6,$7}') +%s) + $1))) EOF You may need to muck with the script to make it match your system's peculiarities; shown is for Debian. Dan -Original Message- From: Felix Frank [mailto:f...@mpexnet.de] Sent: Sunday, October 07, 2012 9:09 AM To: Andrew Eross Cc: Dan Barker; drbd-user@lists.linbit.com Subject: Re: [DRBD-user] IO Error Logging Hi, On 10/06/2012 03:38 AM, Andrew Eross wrote: Below is what I'm seeing in dmesg. No timestamps? Bummer. Does your system log those via syslog too (in Debian, typicalle /var/log/kern.log)? That log typically has far superior timestamps even. Cheers, Felix ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Does oversize disk hurt anything?
-Original Message- From: Florian Haas [mailto:flor...@hastexo.com] Sent: Sunday, October 07, 2012 5:46 PM To: Dan Barker Cc: drbd List Subject: Re: [DRBD-user] Does oversize disk hurt anything? On Sun, Oct 7, 2012 at 2:20 PM, Dan Barker dbar...@visioncomm.net wrote: Well if you had created a partition (/dev/sdc1) rather than use the full disk (/dev/sdc), then you could have set up that partition to match the size of the disk on your primary. Partition. Great idea. If I had thought of that, I'd have bought only one new 500G disk instead of two. Thanks for the hint. 1T disks cost the same as 500G these days. The physical device sizes differing isn't a problem at all; DRBD will just select the smaller size of the two. I know drbd is just using the outside 500G of the oversize disk. It's just that the metadata is in near the hub. A partition would have placed it mid-disk but I didn't think of that. Why? Your cluster manager (typically Pacemaker) should take care of that for you. No cluster manager, no NA. Easy manual failover. This is a lab environment and HA is not really needed. The users of drbd storage are ESXi hosts. To take the primary server off line I: DrbdR0: drbdadm primary all (allow dual primaries is on) DrbdR0: start iet ESXi (all): verify all four paths to both drbd are online We may have had this discussion before, but: http://fghaas.wordpress.com/2011/11/29/dual-primary-drbd-iscsi-and-multipath-dont-do-that/ Thanks for the help. Pleasure. Cheers, Florian Of course I've been following the dont-do-that threads. I've been down that path several times. It works great for a while and then doesn'tg. But that was a couple of years ago. What I am currently doing is different; the exposure is very brief, if at all. When the second DRBD publishes its iSCSI paths, ESXi discovers them but continues to use the original path for all I/O. It's not concurrent multipath. Only when the original path dies (when I stop iet on the primary drbd) does ESXi switch to active I/O on the other path. I think your fears are about simultaneous dual-access, not about what I'm doing. I don't think I'd recommend anyone else do it this way, it's just the way I'm doing it with the hardware laying around. Thanks for the feedback. Here's some feedback for you: drbd is Great! Thanks for making it available. Best wishes for you at Hastexo. Dan ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] Does oversize disk hurt anything?
I just lost a disk on my secondary node. I looked EVERYWHERE and can't find the spare disks I bought for such an occurrence. So, I put in a handy disk, twice the size. drbdadm create-md r1 drbdadm attach r1 and off we go. If memory serves, create-md will build a meta-data at the END of the disk. Won't that cause a lot of seek to the hub when seeking to about the middle of the platters would have done the trick, had the metadata been at the same offset as the primary? Dan version: 8.4.0 (api:1/proto:86-100) GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@DrbdR0, 2012-05-28 12:09:30 (Yes, I know. I need to upgrade). Failed disk: WD 500G Replaced by: WD 1T On server: DrbdR0 cat /etc/drbd.d/r1.res resource r1 { on DrbdR0 { volume 0 { device /dev/drbd1 minor 1; disk /dev/sdc; meta-diskinternal; } address ipv4 10.20.30.46:7790; } on DrbdR1 { volume 0 { device /dev/drbd1 minor 1; disk /dev/sdc; meta-diskinternal; } address ipv4 10.20.30.47:7790; } startup { become-primary-on DrbdR1; } } ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] IO Error Logging
dmesg | grep sr1 should show you all you need to know. Dan (there's that word should againg) From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Andrew Eross Sent: Friday, October 05, 2012 2:17 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] IO Error Logging Hi guys, I'm trying to debug a SSD drive that's the backing device for my secondary node. The primary/secondary are sync'd (protocol C) and everything goes fine until I get to testing fail-over, e.g.on the primary drbdadm secondary drbd-sr1, and on the secondary drbdadm primary drbd-sr1. When I do this the secondary locks up for about 5 minutes (SSH session drops) then it starts responding again and I see drbd has now dropped into diskless mode. I'm thinking there might be IO errors occurring with the underlying disk and perhaps drbd is automatically detaching it. Right now I'm running badblocks on the backing device and seeing if it can find any problems. In the meantime I've been trying to figure out how to get more information about IO errors from drbd. My devices are configured with detach as recommended (http://www.drbd.org/users-guide/s-configure-io-error-behavior.html), however, I'm not sure how to find out more information about when this event occurs. Are there any debugging options I can enable that would help me see IO error details that caused a detach? Thanks! Andrew ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Status Mismatch
The sync hasn't finished. It's at 100%, but still doing cleanup at end-of-task. When it completes, you'll see the correct status. Inconsistent is the VALID status until the sync finishes. When the progress bar goes away, it's really done. Check the logs if you think it's hung there too long. Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of J.R. Lillard Sent: Wednesday, September 19, 2012 9:46 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] Status Mismatch What would cause two nodes to show different statuses? Primary 10: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent A r- ns:154708236 nr:0 dw:155723616 dr:478644485 al:1603514 bm:41448 lo:1 pe:180 ua:0 ap:179 ep:1 wo:f oos:20 [===] sync'ed:100.0% (20/21460)K finish: 0:00:00 speed: 16 (16) K/sec Secondary 10: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate A r- ns:0 nr:114229240 dw:114229240 dr:0 al:0 bm:26155 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 -- J.R. Lillard System / Network Admin Web Programmer Golden Heritage Foods 120 Santa Fe St. Hillsboro, KS 67063 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] What to do about read errors on the primary?
I have read errors on the primary side, which caused the secondary to go into an inconsistent state. It's a shame you lost the logs. They would have said much. When drbd loses a primary disk, it continues to work, read/write, using the secondary disk. The active node will remain primary, the standby node will remain secondary, but the disk state will be diskless/uptodate. All I/O is going over the wire now, reads and writes; not just writes as is the normal (uptodate/uptodate) case. You have described a result different than that, so the precipitating events must be different too. hth Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Alan Robertson Sent: Tuesday, September 18, 2012 12:06 PM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] What to do about read errors on the primary? There was another note mentioning backups... DRBD is designed to protect against server and disk failures. Backups primarily protect against human errors, disasters and so on - and I do have backups... Snarky comments aren't very helpful and don't have much place in civil discourse except maybe with your friends. The fact that you don't want your system to recover from I/O errors is your choice. I'm funny that way - I want my system to do all it can to recover from problems, and minimize data loss... In this case, I have a disk failure which I am having trouble getting DRBD to protect me against. I'm perfectly willing to accept that I should have configured things differently - which would be why I came here asking for help. In the 10+ years I've been using and recommending DRBD, it's never come up for me before. On 09/18/2012 05:16 AM, Lars Ellenberg wrote: Alan Robertson al...@unix.sh schrieb: I have read errors on the primary side, which caused the secondary to go into an inconsistent state. This means that the disk which desperately needs backing up, is no longer being backed up (!). In an ideal world, it seems to me what one would like for DRBD to do would be: get the data from the secondary write it to the primary - which often fixes read errors continue on syncing everything else to the secondary Well, we don't do this yet. We detach the faulty disk, and resync when you reattach. Platform, kernel version, drbd version, configuration and logs... ;-) I actually figured you'd just tell me what I needed to change - so I didn't go grab them the first time. Nevertheless - mea culpa... ;-) The original occurrence is lost to antiquity, unfortunately so logs could only be recent - not when it first happened. I included a good bit of recent logs from both sides. I grepped out drbd issues. Let me know what else you want. What _seems_ to have happened, is that the primary continued on, and the secondary became inconsistent because the two sides were disconnected. Attempts to resync the two failed because of the read error on the primary - making it impossible to switch to the secondary using normal methods. Linux paul 2.6.38-15-server #66-Ubuntu SMP Tue Aug 14 17:42:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux drbd8-utils 2:8.3.9-1ubuntu1 RAID 1 over tcp/ip for Linux utilities $ cat /proc/drbd (on 'paul' - primary for the problematic partition) version: 8.3.9 (api:88/proto:86-95) srcversion: 8925C35502BC976C622CF7A 0: cs:Connected ro:Primary/Secondary ds:UpToDate/Inconsistent C r- ns:3173100 nr:0 dw:3244204 dr:20104349 al:5775 bm:836 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:512 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r- ns:1420 nr:5328 dw:6780 dr:6866 al:0 bm:178 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 paul:/etc/drbd.d $ cat home.res (this is the partition with problems) resource home { device /dev/drbd0; meta-disk internal; on paul { disk /dev/paul/home; address 10.1.1.31:7789; } on silas { disk /dev/silas/home; address 10.1.1.32:7789; } } paul:/etc/drbd.d $ cat etc.res (this partition is a happy camper) resource etc { device /dev/drbd1; meta-disk internal; on paul { disk /dev/paul/etc; address 10.1.1.31:7790; } on silas { disk /dev/silas/etc; address 10.1.1.32:7790; } } paul:/etc/drbd.d $ cat global_common.conf global { usage-count yes; # minor-count dialog-refresh disable-ip-verification
Re: [DRBD-user] What to do about read errors on the primary?
shot myself in the foot somewhere along the line I'm glad you don't need any help on that subject. I have much experience shooting my own foot; I'm glad I don't need to share them with youg. If the primary's disk is the best you've got, and it's worth some file corruption (drbd abhors any single-bit difference from primary to secondary), I think the best course (which will probably crash and burn) is to dd the contents of the primary disk to a new, hopefully identical disk. On error, dd will probably stop. You can then restart it beyond the bad spot with seek=. After less than 200 trys, you'll have a copy of the readable blocks on a disk which will run with no read errors, although there will be junk in the places that were bad before. Mount that to drbd, mount the secondary discard-my-data and let them sync up. Then fsck and hold on to your shorts. AFAICT, that's going to be your best (only?) shot. Not knowing what you did this time makes it difficult to direct you not to do that again, but I'm going to try, Don't do that again. A simple suggestion is to do a weekly verify with email to you if anything is amuck. Of course, even that can fail. No email means no verify error, but it doesn't mean the CPU didn't overheat and shutdown one of the nodes (happened to me a couple weeks back. $2 fan). hth Dan -Original Message- From: Alan Robertson [mailto:al...@unix.sh] Sent: Tuesday, September 18, 2012 1:24 PM To: Dan Barker Subject: Re: [DRBD-user] What to do about read errors on the primary? On 09/18/2012 10:24 AM, Dan Barker wrote: I have read errors on the primary side, which caused the secondary to go into an inconsistent state. It's a shame you lost the logs. They would have said much. When drbd loses a primary disk, it continues to work, read/write, using the secondary disk. The active node will remain primary, the standby node will remain secondary, but the disk state will be diskless/uptodate. All I/O is going over the wire now, reads and writes; not just writes as is the normal (uptodate/uptodate) case. You have described a result different than that, so the precipitating events must be different too. Thanks for the description of how it's supposed to work in this case. I didn't really know. I may have shot myself in the foot somewhere along the line too... I certainly wouldn't count that out. :-D The reason why the logs were lost is that I didn't notice for a long time... It could have been many months. This is my home system. It's actually been many years since I had a disk failure... What I noticed was that some failover tests I was performing didn't work - it insisted on leaving things on the (now-broken) primary side. I then noticed the DRBD state wasn't in sync (and even that was a month or so ago - life has been busy). I tried to bring them into sync using a variety of techniques that didn't work. _Then_ I noticed the I/O errors. The I/O errors are near the end of the disk. I wonder if some of the I/O errors were in the bitmap? But after screwing around, and probably shooting myself in the foot, I'd like for the two sides to continue to try and stay in sync as much as they can. I don't want the synchronization to stop just because there might be an I/O error on one block. Or at least, I _think_ that's what I want. [In my case, of course, it was a lot more than one block - but less than 200]. In my case, the only absolutely up-to-date copy I have is in this failing drive. Not what I wanted... I may have caused this by my flailing around trying to make failover work. -- Alan Robertson al...@unix.sh - @OSSAlanR Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] slow write performance
a) use oflag=direct or dd will just test caching. b) You show a 183MB/s rate, which is pretty good. However, the target appears not to be the drbd volume, you don't say if the resource is connected or waiting for connection, and you don't describe the underlying hardware. c) How is /mnt/mysql/syncing_drbd related to opt/drbd-test.loop or /dev/drdb1? Dan (the top poster) -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of rahulcs Sent: Tuesday, September 11, 2012 4:43 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] slow write performance Hi, I am using DRBD for syncing files between two machines on a LAN. I have created a device on a file using dd if=/dev/zero of=/opt/drbd-test.loop bs=1M count=200 losetup /dev/loop1 /opt/drbd-test.loop. My drbd resource settings are as follows: root@obelix101:/tmp# drbdsetup 1 show disk { size0s _is_default; # bytes on-io-error pass_on _is_default; fencing dont-care _is_default; max-bio-bvecs 0 _is_default; } net { timeout 60 _is_default; # 1/10 seconds max-epoch-size 8000; max-buffers 8000; unplug-watermark16; connect-int 10 _is_default; # seconds ping-int10 _is_default; # seconds sndbuf-size 0 _is_default; # bytes rcvbuf-size 0 _is_default; # bytes ko-count0 _is_default; after-sb-0pri disconnect _is_default; after-sb-1pri disconnect _is_default; after-sb-2pri disconnect _is_default; rr-conflict disconnect _is_default; ping-timeout5 _is_default; # 1/10 seconds on-congestion block _is_default; congestion-fill 0s _is_default; # byte congestion-extents 127 _is_default; } syncer { rate1024000k; # bytes/second after -1 _is_default; al-extents 3389; on-no-data-accessible io-error _is_default; c-plan-ahead0 _is_default; # 1/10 seconds c-delay-target 10 _is_default; # 1/10 seconds c-fill-target 0s _is_default; # bytes c-max-rate 102400k _is_default; # bytes/second c-min-rate 4096k _is_default; # bytes/second } protocol C; _this_host { device minor 1; disk/dev/loop1; meta-disk internal; address ipv4 192.168.245.101:7789; } _remote_host { address ipv4 192.168.245.102:7789; } I am getting really bad write performance: root@obelix101:/tmp# time dd if=/dev/zero of=/mnt/mysql/syncing_drbd bs=1M count=12 12+0 records in 12+0 records out 12582912 bytes (13 MB) copied, 0.0686793 s, 183 MB/s real3m49.773s user0m0.000s sys 0m0.168s What am i doing wrong ? Thanking you, Rahul -- View this message in context: http://old.nabble.com/slow-write-performance-tp34419641p34419641.html Sent from the DRBD - User mailing list archive at Nabble.com. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD volumes have different files, but DRBD reports them being in sync
See below -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Bala Ramakrishnan Sent: Saturday, September 01, 2012 4:24 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] DRBD volumes have different files, but DRBD reports them being in sync I have a DRBD installation on two machines running Centos 6.3 and DRBD 8.4.1 I have a single resource 'agalaxy' being synced across these two machines. This resource has two volumes: Volume 0: /dev/drbd0 mounted on /a10data And Volume 1: /dev/drbd1 mounted on /a10. Volume 0 is running Postgres. I did a lot of other activities with DRBD shutdown. Activities with DRBD shutdown? Did any of these activities affect lv_a10data or lv_a10? If so, the DRBD metadata won't know about it and can't sync it. However after a while, I found that the contents in the directory /a10data on one of the machines was different (some intermediate level directories were missing), yet DRBD (cat /proc/drbd) reported that the file systems were in sync. You can't view the secondary node - it can't be mounted. So what steps are you not telling us? Ultimately, I had to re-initialize and resync the volume by invalidating it: Drbdadm invalidate agalaxy/0 Careful that you do this on the correct node. One of the nodes should have the correct data and the other should have obsolete data. DRBD can sync in either direction. Until/unless you identify how you got the volumes to diverge, you'll probably repeat the problem. If the DRBD nodes get disconnected AND you modify the secondary to primary and mount/write the disk AND you then reconnect the nodes, DRBD will recognize the situation (called Split-brain). You didn't mention split brain, so you must have written to one of the LVs while DRBD was down. Has anyone run into this kind of issue? In summary, you can't say DRBD reports them being in sync when DRBD is down, and you can't access the secondary volumes with DRBD is up - Something is missing from your question. Dan === For example, on balar-lnx3, the contents of /a10data/db/data/system was: [root@balar-lnx3 system]# ls 12531 12664 12670 12675 12681 12687 12779 12531_fsm 12666 12671 12677 12682 12688 pg_control 12531_vm 12667 12672 12678 12683 12773 pg_filenode.map 12533 12668 12672_fsm 12679 12683_fsm 12775 pg_internal.init 12534 12668_fsm 12672_vm 12679_fsm 12683_vm 12777 pgstat.stat 12662 12668_vm 12674 12679_vm 12685 12778 The contents of /a10/data/db/data/system on balar-lnx was: [root@balar-lnx system]# ls base pg_ident.conf pg_stat_tmp PG_VERSION global pg_multixact pg_subtrans pg_xlog pg_clog pg_notify pg_tblspcpostgresql.conf pg_hba.conf pg_serial pg_twophase postmaster.opts [root@balar-lnx system]# The contents of /a10data/db/data/system on balar-lnx3 was actually the contents of /a10data/db/data/system/global on balar-lnx. Yet, DRBD was reporting the status: [root@balar-lnx3 ~]# cat /proc/drbd version: 8.4.1 (api:1/proto:86-100) GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil@Build64R6, 2012-04-17 11:28:08 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r- ns:0 nr:173009724 dw:173009724 dr:0 al:0 bm:10560 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 [root@balar-lnx3 ~]# #= Here is my drbd conf: 1. Global: global { usage-count yes; # minor-count dialog-refresh disable-ip-verification } common { handlers { pri-on-incon-degr /usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b /proc/sysrq-trigger ; reboot -f; pri-lost-after-sb /usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b /proc/sysrq-trigger ; reboot -f; local-io-error /usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o /proc/sysrq-trigger ; halt -f; } startup { wfc-timeout 0; degr-wfc-timeout 120; } options { # cpu-mask on-no-data-accessible } disk { on-io-error detach; } net { protocol C; } } 2. Agalaxy.res: resource agalaxy { disk { resync-rate 100M; fencing resource-only; } handlers { fence-peer /usr/lib/drbd/crm-fence-peer.sh; after-resync-target /usr/lib/drbd/crm-unfence-peer.sh; } on balar-lnx { address 10.0.1.1:7788; volume 0 { device /dev/drbd0; disk/dev/vg_balarlnx3/lv_a10data; meta-disk internal; } volume 1 { device /dev/drbd1
Re: [DRBD-user] using truck based replication, but trying to save some gas
Hi, On Mon, Aug 6, 2012 at 1:28 AM, Two Spirit twospirit6...@gmail.com wrote: http://www.drbd.org/users-guide/s-using-truck-based-replication.html I've already got remote sites (small sites with slow bw) that can hold data, and I'm regularly getting new data that I'd like to replicate remotely to multiple sites. I'm evaluating using drbd for that, but I wanted some torrent type technology to replicate to multiple sites. I currently use rsync, but revering the rsync in a recovery scenario is not desireable. I have used the shipping method to ship and seed to a remote backup site, but I'd like to avoid mainly because from the recovery point of view, when a major disaster happens, I want to start recoverying and utilize more bandwidth than what one remote site can upload, and maybe I can put myself in a situation where recovery would be faster than the truck method for recovery especially since the small sites use DSL and upload speeds are way smaller than their DL speeds. For low bandwidth connections, there's DRBD Proxy, you'd have to talk to the sales guys over at Linbit for more details. Don't know how it covers multi-site scenarios, usually DRBD works only between 2 locations. The exception is the the DRBD stacked scenarios, when data can be in up to 4 different locations. However in most cases, writes are done only in one locations. For situations where you need to write at the same time in different locations, you'll need at least a cluster aware FS for that. anyone know of any torrent technology for remote [secure] replications that I might be able to use? I'm evaluating the tahoe-lafs currently for that, but I'd like something different. Sorry, no clue. HTH, Dan ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user -- Dan Frincu CCNA, RHCE ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] ds:Uptodate not uptodate
Run a verify. It will show issues, I imagine. The version you like should be primary. drbdadm disconnect and then drbdadm connect (from either node) and they'll sync up. I imagine some changes were made to one of the devices while not under drbd control, so no stale bits were set in the bitmap. Verify will find them all and update the bitmap. Also, be CERTAIN you are not mounting the underlying block devices not via drbd anywhere, ever. Oops, I just reread your message, The systems have been synced twice. How are you performing the sync? The verify/disconnect/reconnect is the fastest/best way to resync nearly identical resources. If you have truly synced the devices and not written to them outside drbd control, the only way they can diverge is hardware failure - rarely undetected. Good luck. Let us know how it goes! Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Two Spirit Sent: Saturday, August 04, 2012 11:14 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] ds:Uptodate not uptodate Hi, newbie here, I'm trying to figure out what is going on. I've got 2 servers, primary/secondary, disk states are both uptodate, but there is different information on both on both servers. If I make server1 the primary and mount it, I find file1 in the file system. if I make server1 the secondary and server2 the primary, I find file2 in the filesystem, but not file1. The systems have been synced twice. Not sure how to proceed to figure this out. help. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Switching from internal to external meta-disk
And further, the FIRST step in any maintenance should be is cat /proc/drbd. You would have seen which node had the current data. Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Felix Frank Sent: Wednesday, July 11, 2012 3:33 AM To: Jean-Baptiste Cc: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Switching from internal to external meta-disk Hi, On 07/10/2012 06:59 PM, Jean-Baptiste wrote: 9. Doing drbdadm -- --overwrite-data-of-peer primary RES (from the primary node) 10. Let synchronize process ending 11. Done At this step everything is fine, my SGBD was restarted without any warning, nothing seems to go wrong. But ... I was lost 11 days of data in my SGBD. we've seen similar effects on several occasions on this list. So far, it has always (iirc) been a case of diskless primary. Have you retained logs from 11 days ago? I'd expect you to find a note around that time stating that your primary detached its backing device. *If* this assumption is correct, here's what's happened then: You primary happily kept writing data, but it never reached its local HDD. Instead, all changes were written to the secondary's disk only. When you did your changes and overwrote the data of the secondary, you killed your data. Bottom line is, it's crucial to be mindful of the health state of your resources. Ideally, monitoring should report whenever your disks are not UpToDate/UpToDate, among other possible problems. HTH, Felix ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Parse error an option keyword expected but got fence peer
That is an ancient DRBD. I've been using drbd for years (at least 3), and I don't remember anything before 8.3.7. I'm running 8.4.1 now. Why are you on such an old version? Maybe the command syntax was different and the messages are simply correct. I don't know about 8.2, but I do know the documentation website is segregated into 8.4.x and 8.3.x sections due to command syntax changes. I may be completely off base, but if command parsing is throwing errors, maybe your configuration is invalid. You didn't post the entire config file (Although I don't know what was or was not valid on that old release). I don't think this will help you much, but there's always hope! (Hope that you'll try a more recent drbd - then I CAN help you out). Of course, some other users may be familiar with that version and help you without you having to install a current version. I believe 8.3.13 is the recommended level. Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Keith Christian Sent: Friday, June 22, 2012 7:47 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] Parse error an option keyword expected but got fence peer I've searched for a solution to this error, lots of hits for Parse error but couldn't find anything specific for fence-peer. I have checked the drbd.conf file for obvious errors like unbalanced braces, and missing semicolons at the end of line. Nothing found. Using these RPM's: drbd82-8.2.6-1.el5.centos kmod-drbd82-8.2.6-2 This is on a 64 bit system, so I fixed line 31 which needed lib64 to find the file: ls -l /usr/lib64/heartbeat/drbd-peer-outdater -rwxr-xr-x 1 root root 15984 Feb 6 2008 /usr/lib64/heartbeat/drbd-peer-outdater When running any DRBD command I see this error: drbdadm create-md drbd-resource-0 /etc/drbd.conf:31: Parse error: 'an option keyword' expected, but got 'fence-peer' I commented out line 31, tried to start DRBD again, and saw the error on line 56, removed the comment from line 31, and the error returns to line 31. service drbd start /etc/drbd.conf:56: Parse error: 'an option keyword' expected, but got 'outdated-wfc-timeout' Starting DRBD resources:/etc/drbd.conf:56: Parse error: 'an option keyword' expected, but got 'outdated-wfc-timeout' 53 # Wait for connection timeout if the peer node is already outdated. 54 # (Do not set this to 0, since that means unlimited) 55 # *** 56 outdated-wfc-timeout 2; # 2 seconds. 57# In case there was a split brain situation the devices will 58 # drop their network configuration instead of connecting. Since Below are the first 35 lines of the file, which enclose the line throwing the error: 1 global { usage-count no; } 2 3 resource drbd-resource-0 { 4 protocol C; 5 6 handlers { 7 # what should be done in case the node is primary, degraded 8 # (=no connection) and has inconsistent data. 9 pri-on-incon-degr /usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b /proc/sysrq-trigger ; reboot -f; 10 11 # The node is currently primary, but lost the after split brain 12 # auto recovery procedure. As as consequence it should go away. 13 pri-lost-after-sb /usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b /proc/sysrq-trigger ; reboot -f; 14 15 # In case you have set the on-io-error option to call-local-io-error, 16 # this script will get executed in case of a local IO error. It is 17 # expected that this script will case a immediate failover in the 18 # cluster. 19 local-io-error /usr/lib/drbd/notify-local-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o /proc/sysrq-trigger ; halt -f; 20 21 22 # Commands to run in case we need to downgrade the peer's disk 23 # state to Outdated. Should be implemented by the superior 24 # communication possibilities of our cluster manager. 25 # The provided script uses ssh, and is for demonstration/development 26 # purposis. 27 # fence-peer /usr/lib/drbd/outdate-peer.sh on amd 192.168.22.11 192.168.23.11 on alf 192.168.22.12 192.168.23.12; 28 # 29 # Update: Now there is a solution that relies on heartbeat's 30 # communication layers. You should really use this. *** 31 fence-peer /usr/lib64/heartbeat/drbd-peer-outdater -t 5; 32 # For Pacemaker you might use: 33 # fence-peer /usr/lib/drbd/crm-fence-peer.sh; 34 35 } I'd appreciate any insight or help. == Keith
Re: [DRBD-user] Shrink an LVM partition and create a new one for the /drbd mount ?
Sounds like your root file system is in LogVol01. Makes working on it nearly impossible. A good live-CD with LV tools is rescatux (www.supergrubdisk.org/rescatux). Good luck! I just used it to reorganize some root LVs on RHEL 5. Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Pascal BERTON Sent: Saturday, June 16, 2012 3:32 AM To: 'Keith Christian'; drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Shrink an LVM partition and create a new one for the /drbd mount ? Hi Keith! It does. Don't forget to shrink your filesystem(s) first. You'll find a whole lot of howtos on the net to achieve that depending on the filesystem you actually use. Pascal. -Message d'origine- De : drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] De la part de Keith Christian Envoyé : samedi 16 juin 2012 00:13 À : drbd-user@lists.linbit.com Objet : [DRBD-user] Shrink an LVM partition and create a new one for the /drbd mount ? I have a pre-configured server using LVM, with one large logical volume using all disk space beyond the /boot partition, and a separate /drbd partition is needed. Normally, I remove LVM and create a /drbd partition after formatting the RAID array. That isn't an easy option here, as the machine is pre-configured with the OS, etc. Am I correct that the analog to a physical /drbd partition under LVM is the creation of a logical volume in a volume group? $ lvdisplay --- Logical volume --- LV Name/dev/VolGroup00/LogVol01 VG NameVolGroup00 The lvcreate command failed, not enough free extents: $ lvcreate -v --size 1g --name LogVol02 VolGroup00 Setting logging type to disk Finding volume group VolGroup00 Insufficient free extents (0) in volume group VolGroup00: 32 required I'll have to shrink LogVol01 to make room for a LogVol02 on which /drbd will be mounted. Does this sound like the correct way of creating a partition for /drbd in this situation? Thanks! Keith ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Corosync Configuration
Off OP's topic, but a correction. -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Pascal BERTON Sent: Thursday, June 07, 2012 1:29 PM To: 'Jake Smith'; drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Corosync Configuration I also used this manual to startup, I agree with Jake, that's a typo. Apart from that, William, your bindnetaddr parameter should not end with a 0, it's supposed to be the IP address your local host/node will use to monitor its peers. Instead, replace it by the IP address your server has on network 192.168.1.0. snip Although it is extremely likely that 'should not end with a 0' is a correct statement on this network (it does begin with a 192 and we assumed it is a default 24 bit subnet), it is not a requirement. The OP didn't post his subnet specification. Any network with a subnet mask of 23 bits or less can have a final octet of Zero be a host. In a class A subnet, there is one network (ends with a .0.0.0), 255 hosts that end with two zeros (.0.0) and 65,280 hosts that end with a single zero .0). I normally wouldn't make this minor a correction, but Pascal then said The cool thing (to me) is that I learn something more everyday on this ML... Dan ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] [patch -resend] drbd: fix resync_dump_detail() output
The tests here aren't correct. It should be doing a shift before doing the bitwise AND. (bme-flags BME_NO_WRITES) is always false and (bme-flags BME_LOCKED) checks for BME_NO_WRITES instead of checking for locked. Signed-off-by: Dan Carpenter dan.carpen...@oracle.com --- I sent this to the drbd-user list in March, but never recieved a response. diff --git a/drivers/block/drbd/drbd_proc.c b/drivers/block/drbd/drbd_proc.c index 2959cdf..ffe1ee4 100644 --- a/drivers/block/drbd/drbd_proc.c +++ b/drivers/block/drbd/drbd_proc.c @@ -187,8 +187,10 @@ static void resync_dump_detail(struct seq_file *seq, struct lc_element *e) struct bm_extent *bme = lc_entry(e, struct bm_extent, lce); seq_printf(seq, %5d %s %s\n, bme-rs_left, - bme-flags BME_NO_WRITES ? NO_WRITES : -, - bme-flags BME_LOCKED ? LOCKED : -- + test_bit(BME_NO_WRITES, bme-flags) ? + NO_WRITES : -, + test_bit(BME_LOCKED, bme-flags) ? + LOCKED : -- ); } ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Syncer's after dependency for online verifying
Is it possible to have DRBD verify B only when A is done verifying? I use crontab. On Monday at 2:11 AM, verify r0 On Tuesday at 2:11 AM, verify r1 On Wednesday at 2:11 AM, verify r2 On Thursday at 2:11 AM, verify r3 --or-- 11 2 * * 1 /sbin/drbdadm verify r0 11 2 * * 2 /sbin/drbdadm verify r1 11 2 * * 3 /sbin/drbdadm verify r2 11 2 * * 4 /sbin/drbdadm verfiy r3 Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Lars Ellenberg Sent: Friday, June 01, 2012 11:50 AM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Syncer's after dependency for online verifying On Fri, Jun 01, 2012 at 12:03:20PM +0200, Lionel Sausin wrote: Dear DRBD community, We're using DRBD 8.3.11 with 2 resources called A and B. B is configured to sync after A, but when I run drbdadm verify all both resources start verifying immediately. That is intentional. In error scenarios, you do not have much control over when a network fails/recovers, or a node reboots. So there we have dependencies to not overload the system with too much parallel resync activity. For verify, that is entirely triggered from userspace, you have full controll. Is it possible to have DRBD verify B only when A is done verifying? Sure. Just don't start the verify on B while the verify on A is still running. ;-) like, for minor in 0 1 2 ; do drbdsetup $minor verify ; drbdsetup $minor wait-sync done Or similar. If not, has it been/will it be possible in versions after 8.3.11? I don't see any kernel support for verify dependencies coming soon, as you can easily serialize verify operations in userland. Besides, now that we have the dynamically adaptive resync rate controller, and it is used for verify as well, for most cases I'd recommend to enable it globally, and get rid of the resync dependencies. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] startup hang after yes
Machines are now in semi-production (manual start/stop for obvious reasons). The issue still occurs. If I remove the replication cable and boot the secondary machine (with 4, up to date resources), the boot process hangs after I reply yes to the prompt. Reinserting the cable does allow the startup scripts to continue, with the error message waitpid: Interrupted system call, but simply replying yes is supposed to do so, with no error. I don't recall seeing this problem before, and I've been through about 4 drbd release levels. Dan (the top poster) -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Dan Barker Sent: Saturday, May 26, 2012 1:23 PM To: drbd List Subject: [DRBD-user] startup hang after yes I'm building a new drbd machine. I put 8.4.0 on a Debian 603 and all looks fine except ... Since I'm testing, I don't have another node. I did create-md and then primary --force. At boot time, there is no peer, so I get the count-up to yes. When I enter yes, nothing happens. If I ssh in and stop/start drbd, all is normal and my initialization scripts finally run (the ones after drbd). What can I do to stop the hang? Other possibly mitigating factors: There is no Ethernet cable connected to the NIC for DRBD synchronization. There is only one drbd resource defined, drbd3 (no 0, 1 or 2). I chose 8.4.0 to match the peer in the environment. I thought about 8.4.1 or 8.3.13, but I'll just update everybody to 9 soon. Dan ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Problem between two nodes but only on 1 ressource
You want to preserve the data on node1, not on node2. You may have been reading the doc from the point of view of the wrong side. On node1 you want to discard the peer's data. -- or -- On node2 you want to discard my data. The syntax is different from 8.3 to 8.4 but I'm sure you have that handy. Please tell me you are not accessing /www from node2 (except in an emergency). If you are accessing the resource from both nodes at once, that is a whole different matter and will cause split-brain all the time (unless you are using a cluster-aware file system, and it sounds like you are not). Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of fmorcamp Sent: Tuesday, May 29, 2012 9:01 AM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Problem between two nodes but only on 1 ressource Hi Florian, Thanks for answear. I already see this commands but I wouldn't do it cause I'm am in production env. So my node1 is ok but the ressource www is not duplicate on the node2, so my node2 didn't have all latest files of the node1. And if I understand well, the command will put my node2 as primary. But I do that, it will remove and modify all files which have been removed or modify on my node1 at the next sync no? Fabien Florian Haas-5 wrote: Somebody have a way for me?! http://www.drbd.org/users-guide-8.3/s-resolve-split-brain.html Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user -- View this message in context: http://old.nabble.com/Problem-between-two-nodes-but-only-on-1-ressource-tp33 771275p33924925.html Sent from the DRBD - User mailing list archive at Nabble.com. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] startup hang after yes
I'm building a new drbd machine. I put 8.4.0 on a Debian 603 and all looks fine except ... Since I'm testing, I don't have another node. I did create-md and then primary --force. At boot time, there is no peer, so I get the count-up to yes. When I enter yes, nothing happens. If I ssh in and stop/start drbd, all is normal and my initialization scripts finally run (the ones after drbd). What can I do to stop the hang? Other possibly mitigating factors: There is no Ethernet cable connected to the NIC for DRBD synchronization. There is only one drbd resource defined, drbd3 (no 0, 1 or 2). I chose 8.4.0 to match the peer in the environment. I thought about 8.4.1 or 8.3.13, but I'll just update everybody to 9 soon. Dan ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] I need Reset DRBD Service
On the primary, execute the command drbdadm connect all. That will either work, or put meaningful messages into the log. If they connect, you are fine. However, I suspect they will not connect. Look in the log (dmesg) for drbd messages. They will tell the reason for the connection not being established. Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Cristian Caceres Sent: Thursday, May 17, 2012 5:15 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] I need Reset DRBD Service Hi all, I have little experience in drbd, in fact I received as a legacy a system with this implementation, my problem is that one of the nodes, the secondary, we had to restart, but now I see they are not connected according to, I have sought some solution without success, please if someone can help me decipher this I would appreciate. the status of each server is the following: Primary Server: drbd driver loaded OK; device status: version: 8.2.6 (api:88/proto:86-88) GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by r...@tschawytscha.multiexportfoods.com, 2008-12-23 13:00:05 m:res cs st ds p mounted fstype 0:??not-found?? StandAlone Primary/Unknown UpToDate/DUnknown - Secondary Server: drbd driver loaded OK; device status: version: 8.2.6 (api:88/proto:86-88) GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by r...@tschawytscha.multiexportfoods.com, 2008-12-23 13:00:05 m:res cs st ds p mounted fstype 0:??not-found?? WFConnection Secondary/Unknown UpToDate/DUnknown B Thanks.. rca ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd wrong lower device doubt
The configurations SHOULD match to simplify maintenance, but server A will completely ignore server B settings, and server B will completely ignore server A settings. Your config is being read by the servers as: Host A: resource r0 { on ME { device /dev/drbd0; disk /dev/vg01/share; address 2.2.2.150:7788; meta-disk internal; } } and Host B is like this: resource r0 { on ME { device /dev/drbd0; disk /dev/vg02/share; address 2.2.2.151:7788; meta-disk internal; } } Which is (I believe) what you wanted. Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of ?? Sent: Friday, May 18, 2012 3:20 AM To: drbd-user Subject: [DRBD-user] drbd wrong lower device doubt Hi All, I have some doubts about drbd. I have configure two servers as Host A and Host B. Host A drbd configuration is like this: resource r0 { on A { device /dev/drbd0; disk /dev/vg01/share; address 2.2.2.150:7788; meta-disk internal; } on B { device /dev/drbd0; disk /dev/vg01/share; address 2.2.2.151:7788; meta-disk internal; } } and Host B is like this: resource r0 { on A { device /dev/drbd0; disk /dev/vg02/share; address 2.2.2.150:7788; meta-disk internal; } on B { device /dev/drbd0; disk /dev/vg02/share; address 2.2.2.151:7788; meta-disk internal; } } You can notice that Host A and Host B configuration file is not same. Actually Host A lower device is /dev/vg01/share and Host B lower device is /dev/vg02/share. The specified destination lower device is wrong in each server. Network setting is right . I set Host A disk state to UpToDate and Host B disk state inconsistent. I find that Host A is syncing to Host B. Why it can work regularly when I configure wrong lower device. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Primary/Unknown UpToDate/Outdated
ON the Primary, do a drbdadm connect resource or drbdadm connect all. Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Norman Sent: Monday, April 30, 2012 1:45 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] Primary/Unknown UpToDate/Outdated Hi All, I'm new to the list. I don't deal with drbdadm all the time because drbd just runs (usually). I've got a weird state with my drbd (2 nodes). Everything was fine till, I shutdown my 'secondary' for a few minutes just to re-seat a drive. Now, the Primary says... 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/Outdated r ns:7478380 nr:12656820 dw:22139484 dr:17976297 al:52974 bm:12391 lo:4 pe:0 ua:0 ap:3 ep:1 wo:b oos:795380 Secondary says... 1: cs:WFConnection ro:Secondary/Unknown ds:Outdated/DUnknown C r ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 Nothing really changed, both servers can ping each other via the drbd channel. How do I get the servers connected again and the 'secondary' to be 'UpToDate' again. Is it easily re-synced. Help Please. Thanks, Norman ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] multipathd ignores my drbd0
I've tried everything I can think of to get mapper to pick up the drbd0 device, and all failed. I put in a symlink (ln -s /dev/drbd0 /dev/mapper/DRBD0) but Oracle VM doesn't see it. I'm guessing it's requiring multipath to access the resource. I'll try some Logical Volumes and see if I can make Oracle VM see those. I know I can build a PV/VG/LV from a drbd resource. Running an iSCSI target and initiator on localhost on this machine is getting a bit too weird, even for Oracleg. Thanks for the tips! Dan -Original Message- From: Kaloyan Kovachev [mailto:kkovac...@varna.net] Sent: Tuesday, April 10, 2012 9:53 AM To: Dan Barker Cc: drbd List Subject: Re: [DRBD-user] multipathd ignores my drbd0 I think you can't use DRBD device directly with multipath, but if you export it via iSCSI and then import it back it is possible. Another option is LVM over DRBD - device in /dev/mapper Another one is just a symlink in /dev/mapper from udev rule On Tue, 10 Apr 2012 09:41:13 -0400, Dan Barker dbar...@visioncomm.net wrote: How do I get multipathd to notice my drbd block devices? RHEL5 (Oracle VM 3.0.3.127, actually), 2.6.32.21-45xen. Drbd 8.4.1: multipath-tools says v0.4.9. I can't seem to find the multipath version. resource r0 { on OVMPam { volume 0 { device /dev/drbd0 minor 0; disk /dev/sdb; meta-diskinternal; } address ipv4 172.30.0.167:7789; } on DRPam { volume 0 { device /dev/drbd0 minor 0; disk /dev/sdb; meta-diskinternal; } address ipv4 172.30.0.170:7789; } startup { become-primary-on OVMPam; } } As-distributed global. All multipath commands reply Apr 09 08:10:50 | DM multipath kernel driver not loaded, as if no devices were detected at boot time. I have drbd start before multipathd. Relax, I have no plans to multipath to this device, Oracle VM looks only in /dev/mapper for repository candidates. New 10 April: Ping! Anyone have any ideas? Even a different forum in which to ask? ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Unable to perform initial sync
I do not understand what you did or are trying to do. In Sync has no direction. If you are in sync from Primary to Secondary, you are in sync period. There is no reason to think about a direction. To do a recovery, when one of the resources is to be used as the sync source due to having more current data, there are commands to do that, discard-my-data, overwrite-data-of-peer, etc. They have no inherent direction, but have complimentary meanings depending on which node the command is run. Please explain more what you are trying to accomplish, which node is primary, secondary, which device is considered current and the commands you are issuing and on which host. Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Vyacheslav Karpukhin Sent: Tuesday, April 10, 2012 12:23 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] Unable to perform initial sync Hi. I just installed drbd and now trying to perform initial sync. It works fine in one direction, but if I'm trying to perform it in the opposite direction, I get this: Apr 10 11:13:54 web kernel: block drbd0: Becoming sync source due to disk states. Apr 10 11:13:54 web kernel: block drbd0: peer( Unknown - Secondary ) conn( WFReportParams - WFBitMapS ) Apr 10 11:13:54 web kernel: block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0% Apr 10 11:13:54 web kernel: block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0% Apr 10 11:13:54 web kernel: block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 Apr 10 11:13:54 web kernel: block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0) Apr 10 11:13:54 web kernel: block drbd0: conn( WFBitMapS - SyncSource ) Apr 10 11:13:54 web kernel: block drbd0: Began resync as SyncSource (will sync 15519040 KB [3879760 bits set]). Apr 10 11:13:54 web kernel: block drbd0: updated sync UUID AF78EBCA7F218B01:C237DF3A275A375B:C236DF3A275A375B:C235DF3A275A375B 275A375B:C235DF3A275A375B Apr 10 11:13:54 web kernel: block drbd0: /dev/shm/drbd-8.4.1/drbd/drbd_receiver.c:2541: sector: 0s, size: 4194304 Apr 10 11:13:54 web kernel: d-con r0: error receiving RSDataRequest, e: -22 l: 0! Apr 10 11:13:54 web kernel: d-con r0: peer( Secondary - Unknown ) conn( SyncSource - ProtocolError ) Apr 10 11:13:54 web kernel: d-con r0: asender terminated Apr 10 11:13:54 web kernel: d-con r0: Terminating asender thread What's wrong? Thank you. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] multipathd ignores my drbd0
Kaloyan: I can't get Oracle VM to look at the Logical Volume either (building a lv on drbd was dirt-simple). I think I'm hosed. I'll return to the Oracle forum and see what they say about LVM, but unless I can get the drbd resource to be presented by multipath, I don't think Oracle VM will use it. Thanks for the help, none the less. Dan -Original Message- From: Kaloyan Kovachev [mailto:kkovac...@varna.net] Sent: Tuesday, April 10, 2012 9:53 AM To: Dan Barker Cc: drbd List Subject: Re: [DRBD-user] multipathd ignores my drbd0 I think you can't use DRBD device directly with multipath, but if you export it via iSCSI and then import it back it is possible. Another option is LVM over DRBD - device in /dev/mapper Another one is just a symlink in /dev/mapper from udev rule On Tue, 10 Apr 2012 09:41:13 -0400, Dan Barker dbar...@visioncomm.net wrote: How do I get multipathd to notice my drbd block devices? RHEL5 (Oracle VM 3.0.3.127, actually), 2.6.32.21-45xen. Drbd 8.4.1: multipath-tools says v0.4.9. I can't seem to find the multipath version. resource r0 { on OVMPam { volume 0 { device /dev/drbd0 minor 0; disk /dev/sdb; meta-diskinternal; } address ipv4 172.30.0.167:7789; } on DRPam { volume 0 { device /dev/drbd0 minor 0; disk /dev/sdb; meta-diskinternal; } address ipv4 172.30.0.170:7789; } startup { become-primary-on OVMPam; } } As-distributed global. All multipath commands reply Apr 09 08:10:50 | DM multipath kernel driver not loaded, as if no devices were detected at boot time. I have drbd start before multipathd. Relax, I have no plans to multipath to this device, Oracle VM looks only in /dev/mapper for repository candidates. New 10 April: Ping! Anyone have any ideas? Even a different forum in which to ask? ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] multipathd ignores my drbd0
Yes. In fact I had a bit of bother to get multipath not to grab my DRBD resources. The multipath.conf blacklist regexps didn't seem to work the way I expected. I resorted to putting the scsi-id names in blacklisted.wwids. Of course, that may mean that the drbd resource was blacklisted, but multipath -ll -v3 has zero occurrences of drbd in it. Dan -Original Message- From: Kushnir, Michael (NIH/NLM/LHC) [C] [mailto:michael.kush...@nih.gov] Sent: Tuesday, April 10, 2012 1:23 PM To: Dan Barker; 'drbd List' Subject: RE: [DRBD-user] multipathd ignores my drbd0 Did you check multipath blacklists? -Michael -Original Message- From: Dan Barker [mailto:dbar...@visioncomm.net] Sent: Tuesday, April 10, 2012 12:40 PM To: 'drbd List' Subject: Re: [DRBD-user] multipathd ignores my drbd0 I've tried everything I can think of to get mapper to pick up the drbd0 device, and all failed. I put in a symlink (ln -s /dev/drbd0 /dev/mapper/DRBD0) but Oracle VM doesn't see it. I'm guessing it's requiring multipath to access the resource. I'll try some Logical Volumes and see if I can make Oracle VM see those. I know I can build a PV/VG/LV from a drbd resource. Running an iSCSI target and initiator on localhost on this machine is getting a bit too weird, even for Oracleg. Thanks for the tips! Dan -Original Message- From: Kaloyan Kovachev [mailto:kkovac...@varna.net] Sent: Tuesday, April 10, 2012 9:53 AM To: Dan Barker Cc: drbd List Subject: Re: [DRBD-user] multipathd ignores my drbd0 I think you can't use DRBD device directly with multipath, but if you export it via iSCSI and then import it back it is possible. Another option is LVM over DRBD - device in /dev/mapper Another one is just a symlink in /dev/mapper from udev rule On Tue, 10 Apr 2012 09:41:13 -0400, Dan Barker dbar...@visioncomm.net wrote: How do I get multipathd to notice my drbd block devices? RHEL5 (Oracle VM 3.0.3.127, actually), 2.6.32.21-45xen. Drbd 8.4.1: multipath-tools says v0.4.9. I can't seem to find the multipath version. resource r0 { on OVMPam { volume 0 { device /dev/drbd0 minor 0; disk /dev/sdb; meta-diskinternal; } address ipv4 172.30.0.167:7789; } on DRPam { volume 0 { device /dev/drbd0 minor 0; disk /dev/sdb; meta-diskinternal; } address ipv4 172.30.0.170:7789; } startup { become-primary-on OVMPam; } } As-distributed global. All multipath commands reply Apr 09 08:10:50 | DM multipath kernel driver not loaded, as if no devices were detected at boot time. I have drbd start before multipathd. Relax, I have no plans to multipath to this device, Oracle VM looks only in /dev/mapper for repository candidates. New 10 April: Ping! Anyone have any ideas? Even a different forum in which to ask? ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Unable to perform initial sync
That is part of the story. You most likely have some protocol issues (thus the log entries). Why would you experiment with cross-version drbd? You should have the easiest results with both servers at the same, recent level. You would need to verify the kernel module and userland program versions on both servers, the commands run, and the relevant dmesg logs from both sides for folks to help you on this problem. ... BTW ... Are you aware that if you do not care about the contents of the disks, you don't have to sync all the zeros? You can create up-to-date disks instantly, and then put a file system on it. Everything will be in sync. The first verify will find a bunch of out-of-sync blocks, but they are in the filesystem's free space and are synced by simply doing a disconnect/reconnect on the secondary node. It really speeds up initial setup, expecially with multi-terabyte resources. (See new-current-uuid in http://www.drbd.org/users-guide-8.3/re-drbdsetup.html; syntax may be different on 8.3 versions). Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Vyacheslav Karpukhin Sent: Tuesday, April 10, 2012 2:49 PM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Unable to perform initial sync On 11.04.2012, at 0:36, Dan Barker wrote: I do not understand what you did or are trying to do. In Sync has no direction. If you are in sync from Primary to Secondary, you are in sync period. There is no reason to think about a direction. I'm talking about direction because in my case sync from server B to server A works fine, but from server A to server B -- fails. Since it's initial sync of newly created volume, it doesn't matter for me which of the hosts to make Primary, and which -- Secondary. So, that initial sync may be performed in two directions -- from server A to server B, or from server B to server A. In my case when I mark server B as primary, everything is fine, drbd synchronizes from B to A. But if instead I mark server A as primary, synchronization won't perform -- there are protocol errors in the log. I tried to use different versions of drbd, and found out that this issue starts with 8.3.11. Right now drbd performs synchronization from A to B with 8.3.10, but I couldn't make it do that with 8.3.11, 8.3.12 and 8.4.1. In my experiments each time I do the following: 1) create-md on both servers 2) starting the resource on both servers 3) marking one of the servers as primary ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] multipathd ignores my drbd0
Yes, I've about come to the conclusion that multipath will not work with a drbd resource, and Oracle VM will not work with a non-multipath resource. So, no DRBD LVM Multipath Oracle VM either. Dang! Of course multipath will work with the underlying device, but then drbd cannot replicate it except offline and without benefit of the activity log or bitmap. Close to useless. Looking like a second server (or SAN) is going to be required to run any sort of DR other than occasional [incremental] backups managed manually. Sigh, ... -Original Message- From: Pascal BERTON [mailto:pascal.bert...@free.fr] Sent: Tuesday, April 10, 2012 3:30 PM To: 'Dan Barker'; 'drbd List' Subject: RE: [DRBD-user] multipathd ignores my drbd0 Dan, Isn't it the backing device you should refer ? To me, if you want multipath to work correctly, the device has to return an ID. Having said that, if I try it on my resource having the following config file : resource vmfs2 { device /dev/drbd1; disk /dev/vg_vmfs2/lv_vmfs2; meta-disk internal; net { } disk { } on ipstore11 { address 195.165.5.245:7791; } on ipstore21 { address 195.165.5.247:7791; } } If I try : [root@ipstore21 ~]# scsi_id --whitelisted --device=/dev/vg_vmfs2/lv_vmfs2 3600508b1001c3a60557b6713c82b915c [root@ipstore21 ~]# Now, if I try : [root@ipstore21 ~]# scsi_id --whitelisted --device=/dev/drbd1 [root@ipstore21 ~]# As far as I understand, if no SCSI ID, multipathd can't identify the device, so no multipath... But configuring it with the backing device should work... HTH. Regards, Pascal. -Message d'origine- De : drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] De la part de Dan Barker Envoyé : mardi 10 avril 2012 21:08 À : 'drbd List' Objet : Re: [DRBD-user] multipathd ignores my drbd0 Yes. In fact I had a bit of bother to get multipath not to grab my DRBD resources. The multipath.conf blacklist regexps didn't seem to work the way I expected. I resorted to putting the scsi-id names in blacklisted.wwids. Of course, that may mean that the drbd resource was blacklisted, but multipath -ll -v3 has zero occurrences of drbd in it. Dan -Original Message- From: Kushnir, Michael (NIH/NLM/LHC) [C] [mailto:michael.kush...@nih.gov] Sent: Tuesday, April 10, 2012 1:23 PM To: Dan Barker; 'drbd List' Subject: RE: [DRBD-user] multipathd ignores my drbd0 Did you check multipath blacklists? -Michael -Original Message- From: Dan Barker [mailto:dbar...@visioncomm.net] Sent: Tuesday, April 10, 2012 12:40 PM To: 'drbd List' Subject: Re: [DRBD-user] multipathd ignores my drbd0 I've tried everything I can think of to get mapper to pick up the drbd0 device, and all failed. I put in a symlink (ln -s /dev/drbd0 /dev/mapper/DRBD0) but Oracle VM doesn't see it. I'm guessing it's requiring multipath to access the resource. I'll try some Logical Volumes and see if I can make Oracle VM see those. I know I can build a PV/VG/LV from a drbd resource. Running an iSCSI target and initiator on localhost on this machine is getting a bit too weird, even for Oracleg. Thanks for the tips! Dan -Original Message- From: Kaloyan Kovachev [mailto:kkovac...@varna.net] Sent: Tuesday, April 10, 2012 9:53 AM To: Dan Barker Cc: drbd List Subject: Re: [DRBD-user] multipathd ignores my drbd0 I think you can't use DRBD device directly with multipath, but if you export it via iSCSI and then import it back it is possible. Another option is LVM over DRBD - device in /dev/mapper Another one is just a symlink in /dev/mapper from udev rule On Tue, 10 Apr 2012 09:41:13 -0400, Dan Barker dbar...@visioncomm.net wrote: How do I get multipathd to notice my drbd block devices? RHEL5 (Oracle VM 3.0.3.127, actually), 2.6.32.21-45xen. Drbd 8.4.1: multipath-tools says v0.4.9. I can't seem to find the multipath version. resource r0 { on OVMPam { volume 0 { device /dev/drbd0 minor 0; disk /dev/sdb; meta-diskinternal; } address ipv4 172.30.0.167:7789; } on DRPam { volume 0 { device /dev/drbd0 minor 0; disk /dev/sdb; meta-diskinternal; } address ipv4 172.30.0.170:7789; } startup { become-primary-on OVMPam; } } As-distributed global. All multipath commands reply Apr 09 08:10:50 | DM multipath kernel driver not loaded, as if no devices were detected at boot time. I have drbd start before multipathd. Relax, I have no plans to multipath to this device, Oracle VM looks only in /dev/mapper for repository candidates. New 10 April: Ping! Anyone have any ideas? Even a different forum in which to ask? ___ drbd-user mailing list drbd
[DRBD-user] multipathd ignores my drbd0
How do I get multipathd to notice my drbd block devices? RHEL5 (Oracle VM 3.0.3.127, actually), 2.6.32.21-45xen. Drbd 8.4.1: multipath-tools says v0.4.9. I can't seem to find the multipath version. resource r0 { on OVMPam { volume 0 { device /dev/drbd0 minor 0; disk /dev/sdb; meta-diskinternal; } address ipv4 172.30.0.167:7789; } on DRPam { volume 0 { device /dev/drbd0 minor 0; disk /dev/sdb; meta-diskinternal; } address ipv4 172.30.0.170:7789; } startup { become-primary-on OVMPam; } } As-distributed global. All multipath commands reply Apr 09 08:10:50 | DM multipath kernel driver not loaded, as if no devices were detected at boot time. I have drbd start before multipathd. Relax, I have no plans to multipath to this device, Oracle VM looks only in /dev/mapper for repository candidates. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] How to create WFConnection resource with one DRBD host?
I need to test that DRBD will peacefully cohabit with Oracle VM. I want to build a single-node DRBD array and need my resource in WFConnection Primary/unknown Uptodate/DUnknown. Otherwise I need two drbds (I can do that, but it's not germane to this test). I did a create-md (successful) and a drbdadm new-current-uuid minor-0 (failed): [ 377.048406] d-con r0: conn( WFConnection - Disconnecting ) [ 377.048416] d-con r0: Discarding network configuration. [ 377.048559] d-con r0: Connection closed [ 377.048561] d-con r0: conn( Disconnecting - StandAlone ) [ 377.048563] d-con r0: receiver terminated [ 377.048576] d-con r0: Terminating receiver thread [ 377.048594] block drbd0: disk( Inconsistent - Failed ) [ 377.057027] block drbd0: disk( Failed - Diskless ) [ 377.057132] block drbd0: drbd_bm_resize called with capacity == 0 [ 377.057229] d-con r0: Terminating worker thread 8.4.1 What did I miss? ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd storage for Oracle VM
Eduardo, thank you for your input, but Virtual Box (workstation product) is a completely different animal than Oracle VM (bare-metal hypervisor). I don't get to choose the underlying storage, and OVM3 choses OCFS2. I guess I'll simply have to try it out and see if DRBD will compile/run/etc. It will. However OCFS2 will only work on Protocol C (dual-Primary mode requires synchronous replication by definition), so your original plan of using protocol A or B is out. Why do you need to compile though? I thought UEK2 shipped with DRBD. Or has that not been made available for Oracle VM yet? Florian That sounds encouraging. Not having to compile will mean I don't have to locate all the dependencies. UEK2 shipped with DRBD. How would I check? modinfo drbd; which drbdadm, or yum search drbd. I had already tried drbdadm --help and got command not found. Modinfo came up with could not find module drbd. yum requires a repository to do anything. I tried http://public-yum.oracle.com/repo/OracleLinux/OL5/latest/x86_64/ but drbd wasn't there. So I tried to compile. I found gcc, make and flex, but am coming up empty on linux-headers-2.6.32.21-45xen. I don't believe I can build a drbd without the headers. Any ideas? (I also posted the question on the Oracle VM forum at forums.oracle.com) My commands: ./configure --with-km --prefix /usr --sysconfdir /etc --localstatedir /var make clean all Result: Userland tools build was successful. SORRY, kernel makefile not found. You need to tell me a correct KDIR, Or install the neccessary kernel source packages. (as expected). Any ideas on finding kernel headers? I have no need for dual-primary and I can't see how OCFS2 can even tell if the underlying block device is dual-primary or single-primary. Because the device wouldn't be writable on one of the nodes, which OCFS2 is sure to choke on? Well, before a failure, xen won't even be running. I don't kwow if OCFS2 will need write permissions on a volume not mounted. I'd expect not. The stand-by node will bring up drbd secondary and xen not at all. Failover will include assuring the primary node is down or at least offline, 'drbdadm primary r0' at the warm site, and then start xen (which will mount the OCFS2 volume as part of its initialization - at least that's what I'm guessing will work. I can't test it until I can get the kernel modules compiled. I believe Eduardo was using dual-primary. I simply need to do a warm-failover (manual) in the event of a disaster. Probably need to do OCFS2-on-iSCSI with iSCSI-on-DRBD then. Florian I don't see the need for that level of complexity. Before we'd do that many levels of misdirection on a single host (SAS/Raid/DRBD/IET/OVM/OCFS2), I think a SAN would be a better choice. My goal is to get this running in a single box (per site). I got an answer from forums.oracle.com. There is a development platform for this sort of thing. Since I heard nothing from the DRBD community, I'll assume I'm in new territory. I'll report back after I build a drbd kernel rpm (or the 3.1.1 Oracle VM comes out with drbd built in - it's to use the new UEK2 kernel). REF: https://forums.oracle.com/forums/thread.jspa?threadID=2368799 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd storage for Oracle VM
Eduardo, thank you for your input, but Virtual Box (workstation product) is a completely different animal than Oracle VM (bare-metal hypervisor). I don’t get to choose the underlying storage, and OVM3 choses OCFS2. I guess I’ll simply have to try it out and see if DRBD will compile/run/etc. Dan From: Eduardo Diaz - Gmail [mailto:ediaz...@gmail.com] Sent: Tuesday, April 03, 2012 5:44 AM To: Dan Barker Cc: drbd List Subject: Re: [DRBD-user] drbd storage for Oracle VM I am using Active/Active drbd with virtualbox ocfs2, if you have any questions feel free to ask. My experience is if you don't need ocfs2 don't use... is better use reiserfs or xfs and XEN (if you don't know abour virtualization you can use virtualbox, it is more easy). If you need more info fell free contact by chat in gmail. regards! On Sat, Mar 31, 2012 at 7:57 PM, Dan Barker dbar...@visioncomm.net wrote: I'm looking at a Primary/Secondary, Protocol B (or maybe A with Proxy) for a warm-DR site of a one-host, 4VM Oracle VM 3.0.3 system. Anyone got any experience with OVM 3 and DRBD? I was planning to put DRBD on the OVM 3 host and have it provide /dev/drbd0 to OVM for its repository (OVM3 Repositories are OCFS2, in case you are not familiar). I have not even checked that it will compile, but if someone has done it before that would be good to know. Specific hardware is a dual-XEON 5690 box with an RAID 1+0 array Areca 1222 (multi-terabyte), 4 Gig NICs and 96G Ram in a 2U enclosure. Dan, in Atlanta ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd storage for Oracle VM
Eduardo, thank you for your input, but Virtual Box (workstation product) is a completely different animal than Oracle VM (bare-metal hypervisor). I don't get to choose the underlying storage, and OVM3 choses OCFS2. I guess I'll simply have to try it out and see if DRBD will compile/run/etc. It will. However OCFS2 will only work on Protocol C (dual-Primary mode requires synchronous replication by definition), so your original plan of using protocol A or B is out. Why do you need to compile though? I thought UEK2 shipped with DRBD. Or has that not been made available for Oracle VM yet? Florian That sounds encouraging. Not having to compile will mean I don't have to locate all the dependencies. UEK2 shipped with DRBD. How would I check? I have no need for dual-primary and I can't see how OCFS2 can even tell if the underlying block device is dual-primary or single-primary. I believe Eduardo was using dual-primary. I simply need to do a warm-failover (manual) in the event of a disaster. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] drbd storage for Oracle VM
I'm looking at a Primary/Secondary, Protocol B (or maybe A with Proxy) for a warm-DR site of a one-host, 4VM Oracle VM 3.0.3 system. Anyone got any experience with OVM 3 and DRBD? I was planning to put DRBD on the OVM 3 host and have it provide /dev/drbd0 to OVM for its repository (OVM3 Repositories are OCFS2, in case you are not familiar). I have not even checked that it will compile, but if someone has done it before that would be good to know. Specific hardware is a dual-XEON 5690 box with an RAID 1+0 array Areca 1222 (multi-terabyte), 4 Gig NICs and 96G Ram in a 2U enclosure. Dan, in Atlanta ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] corrupted data disk
Prevent? Good applications. Recover? Good backups. -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Umarzuki Mochlis Sent: Friday, March 16, 2012 2:32 AM To: drbd-user Subject: [DRBD-user] corrupted data disk will a corrupted data/filesystem on primary will also get copied to secondary (garbage in, garbage out?) on drbd 8.3? if yes, how to prevent this? -- Regards, Umarzuki Mochlis http://debmal.my ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] centos 5 drbd 8 uptodate is false
You don't have to shut down the VMs, but you have to have adequate storage. Build a NEW dbrd array. Migrate a your VMs to it. Delete the old array. Depending on your storage and VMs, you may be able to stage this in several steps so you don't need all the storage to be duplicated. Assuming you have VMa, VMb and VMc on Disk1, and VMd, VMe and VMf on Disk2, you could. Build a drbd array on New space. Migrate VMa, VMb and VMc to it. Build a drbd array on the space of Disk1. Migrate VMd, VMe and VMf to it. Recover the space of Disk2. hth Dan Barker -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Adi Spivak Sent: Thursday, March 15, 2012 7:24 AM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] centos 5 drbd 8 uptodate is false Ok, i now understand what i did wrong. I tried to use drbd as a solution for a production server without turning any on my VM off, however, i understart that this might not be the solution i am looking for right now, as this is for a new setup only or if i am willing to turn off the VMs... For anyone that is interested i use an iscasi device to export the LVM drives to the XEN server. This solution will force me to export the DRBD devices out to XEN instead of the Lvms and rebooting all of my servers and changing all of my config files to reflect the new setup to support DRBD. I will try to look for some other solution as turning off the production servers is a bit of an issue... Thank you all... On 03/15/2012 12:56 PM, Andreas Kurz wrote: On 03/15/2012 09:20 AM, Adi Spivak wrote: Hi. I ma using the drbd that comes with centos. My centos version is 5.8 My /proc/drbd say as following: version: 8.0.16 (api:86/proto:86) GIT-hash: d30881451c988619e243d6294a899139eed1183d build by mockbu...@v20z-x86-64.home.local, 2009-08-22 13:27:08 1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0 act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0 This is definitely an unused device ... no writes since started. And please don't use that ancient version but the latest 8.3.12 from Centos Extras repo. Whatever device you are using in your Xen VMs it is not /dev/drbd1. But if you show us your Xen VM config and the output of lvs command it should be more clear what is going on. Regards, Andreas ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd read-only mode
ro: doesn't mean Read Only in that context. I don't recall offhand what it does mean, but my disks are all read-write and show ro: there. Someone else will chime in with the real meaning, I'm sure. Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of ?? Sent: Thursday, March 15, 2012 10:22 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] drbd read-only mode Hi I installed 8.4.1 drdb. The cluster operates on a primary/primary mode. However, the drives are mounted in the mode of read-only [root@noc-1-m77 /]# cat /proc/drbd version: 8.4.1 (api:1/proto:86-100) GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by r...@noc-1-synt.rutube.ru, 2012-03-14 10:05:49 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r- ^^ ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 Why not activate the read-write mode? Config: [root@noc-1-synt /]# cat /etc/drbd.d/r0.res # create new resource r0 { startup { wfc-timeout 20; degr-wfc-timeout 10; # we will keep this commented until tested successfully: become-primary-on both; } net { protocol C; allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } # DRBD device device /dev/drbd0; # phisical device disk /dev/vg_noc1synt/lv02; meta-disk internal; on noc-1-synt.rutube.ru { # IP address:port address 10.1.20.10:7788; } on noc-1-m77.rutube.ru { address 10.2.20.9:7788; } } [root@noc-1-synt /]# ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Kernel hung on DRBD / MD RAID
On 02/21/2012 12:03 AM, Andreas Bauer wrote: So when vm-master is Primary, vm-slave is Secondary, and I force-detach the backing device on vm-master, DRBD will automatically make vm-slave the Primary and direct writes to that host? no. The secondary remains secondary. However, the primary cannot write to its local disk (seeing as it is detached), so writes are done *only* on the secondary (normally, any write is done on both nodes). When the diskless node eventually gets access to a backing storage device again (the old disk or a new one), it resyncs with the UpToDate one and you're back to normal. Of course, if you loose connectivity or the peer's disk, you're down to no disk, and therefore out of operation. HTH, Felix Thanks, it does help. So a node can be Primary and inconsistent while the opposite node is Secondary and UpToDate and read requests are (necessarily) satisfied over network. Didn't know that. regards, Andreas and Writes! ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Backing up VMware VM's running DRBD with VMware Data Recovery (VDR)
Mark said: Snapshot creation requires that VMware tools are installed. This is not correct (at least on ESXi4 and 5). quiesce VM requires VMware tools, VDR may require VMware tools, but snapshots do not. This doesn't affect the OP's problem, but the incorrect statement needed illumination. Dan in Atlanta -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Pascal BERTON (EURIALYS) Sent: Friday, February 17, 2012 6:52 AM To: 'Mark Watts'; drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Backing up VMware VM's running DRBD with VMware Data Recovery (VDR) Mark, Snapshot creation requires that VMware tools are installed, which brings the sync driver onboard. Cannot quiesce VM means it is unable to contact the sync driver. Have you installed the VMware Tools ? Best regards, Pascal. -Message d'origine- De : drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] De la part de Mark Watts Envoyé : vendredi 17 février 2012 10:29 À : drbd-user@lists.linbit.com Objet : [DRBD-user] Backing up VMware VM's running DRBD with VMware Data Recovery (VDR) -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have a pair of CentOS 5.7 VM's running an LVM/DRBD/EXT3 Pri/Sec cluster. Since we use VDR to take snapshots of our VM's I naturally added these two VM's to the backup rota. Pretty much every night I get hundreds of errors in the VDR logs relating to the Primary, giving the message: Failed to create snapshot for VDR01, error -3960 ( cannot quiesce virtual machine) I'm taking a wild guess at this perhaps being related to DRBD; can anyone suggest whether this is the issue? Mark. - -- Mark Watts BSc RHCE Senior Systems Engineer, MSS Secure Managed Hosting www.QinetiQ.com QinetiQ - Delivering customer-focused solutions GPG Key: http://www.linux-corner.info/mwatts.gpg -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (GNU/Linux) Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org/ iEYEARECAAYFAk8+HdoACgkQBn4EFUVUIO2jEgCfecWw3SEZBS8Of0QWdeDII4Kw sksAn13yrWX7+7Nz1katOuUfH/5hUbO8 =IQhD -END PGP SIGNATURE- ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD+Pacemaker: Won't promote with only one node
Hi, On Wed, Jan 4, 2012 at 10:10 PM, William Seligman selig...@nevis.columbia.edu wrote: I'll give the technical details in a moment, but I thought I'd start with a description of the problem. I have a two-node active/passive cluster, with DRBD controlled by Pacemaker. I upgraded to DRBD 8.4.x about six months ago (probably too soon); everything was fine. Then last week we did some power-outage tests on our cluster. Each node in the cluster is attached to its own uninterruptible power supply; the STONITH mechanism is to turn off the other node's UPS. In the event of an extended power outage (this happens 2-3 times a year at my site), it's likely that one node will STONITH the other when the other node's UPS runs out of power and shuts it down. This means that when power comes back on, only one node will come back up, since the STONITHed UPS won't turn on again without manual intervention. The problem is that with only one node, Pacemaker+DRBD won't promote the DRBD resource to primary; it just sits there at secondary and won't start up any DRBD-dependent resources. Only when the second node comes back up will Pacemaker assign one of them the primary role. I've confirmed this by shutting down corosync on both nodes, then bringing it up again on just one of them. Could you also post your Pacemaker configuration? Also you might want to check http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#id890288 for no-quorum-policy, in two-node clusters, losing one node means you don't have quorum, and unless you something else as a quorum device, then the policy is set to stop. HTH, Dan I'm pretty sure that this is due to a mistake Ive made in made in my DRBD configuration when I fiddled with it during the 8.4.x upgrade. I've attached the files. Can one of you kind folks spot the error? Technical details: Two-node configuration: hypatia and orestes OS: Scientific Linux 5.5, kernel 2.6.18-238.19.1.el5xen Packages: drbd-8.4.1-1 corosync-1.2.7-1.1.el5 pacemaker-1.0.12-1.el5.centos openais-1.1.3-1.6.el5 Attached: global_common.conf, nevis-admin.res -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user -- Dan Frincu CCNA, RHCE ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] Is VMFS5 cluster aware?
Is the new filesystem for VMWare's vSphere 5 cluster aware? I continue to see references to GFS2 and OCFS2 but never a mention of VMFS5. I'm just curious for now, because I'm running single primary. If I could save the network latency for reads from one host that would be sweet! Dan ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Enabling a stacked resource - HELP!
And on 8.4.0 up, you can use the same IP and port. See volumes: http://www.drbd.org/users-guide/ap-recent-changes.html Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of SlingPirate Sent: Thursday, September 15, 2011 8:28 AM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Enabling a stacked resource - HELP! For anyone using google and stuck like I was, you can use the same net card just need diferent ports assigned to each resource. SlingPirate wrote: Thanks for your reply Matthieu, I'm now trying it as seperate DRBD's for each of the VMs as live migration would be nice but it's not a necessity. Only now I seem to be running into a problem of not being able to use the same network card for multiple DRBD resources. I'm hoping that I won't have to use a different IP's for each resource! Matthieu Labbé wrote: Hello, 2011/9/13 SlingPirate sling.t...@googlemail.com: Thank you for your input Lars, and apologies for my lack of understanding. I have done some more reading and I am still somewhat unsure as how to approch this. I will try to re-explain what I'm after and I would be grateful for any more of your time and guidance. * On-site **** Off-site *** = NODE-1 == NODE-2 = = NODE-DR = = VM-1 == VM-1 = = VM-1 = = VM-2 == VM-2 = = VM-2 = = VM-3 == VM-3 = = VM-3 = = VM-4 == VM-4 = = VM-4 = Under normal conditions I would like to run VM-1 VM-2 on NODE-1 while VM-3 VM-4 run on NODE-2. If NODE-1 dies NODE-2 runs all the VMs or visa-versa. If the site that NODE-1 NODE-2 are based at blows up, NODE-3 is ready to run all VMs. I can use either block devices or image files for the VMs. What would you suggest would be the correct way to set this up? You need more than one DRBD, for example: DRBD0 for VM-1 VM-2, Primary on NODE-1, Secondary on NODE-2 DRBD1 for VM-3 VM-4, Primary on NODE-2, Secondary on NODE-1 or use one DRBD per VM as suggested before. You only need dual primary if you want to do live migration between NODE-1 and NODE-2 and in that case the classic 3-way setup doesn't work. If you don't need live migration, do classic 3-way setups. Hope that help, Matt. -- Matthieu Labbé http://mattlabbe.com/ ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user -- View this message in context: http://old.nabble.com/Enabling-a-stacked-resource---HELP%21-tp32423513p32471169.html Sent from the DRBD - User mailing list archive at Nabble.com. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRDB + OCF Active Active
Hi, On Wed, Sep 7, 2011 at 10:28 PM, Nick Khamis sym...@gmail.com wrote: Hello Everyone, We are looking to setup write intesive services using database technologies. Doing some research I yielded the attached document. Is there a issue in terms of performance using DRBD on an active active, and say mysql database? That being said, what is the best combination for clustering using DRBD: OCFS2 active active EXT3 active passive The second one. On top of HA, load balancing is also important to us. The document also detailed MySQL Cluster using the ndb engine. There are some benefits for using such a solution, but also downsides, therefore it is best to evaluate all of your requirements and see which fits best. On the MySQL Cluster with ndb approach, you have to assess what will be the database/s size/s, what would be an estimated growth per week, month, year, and plan your hardware requirements accordingly, as well as plan for expansion, as ndb is an in-memory database, which on MySQL Cluster scales to multiple nodes by partitioning the database and having a primary copy of a partition on a node, and one or more (minimum is one) backup copy of the same partition on another node (all stored in RAM). As more nodes are added to the MySQL Cluster, the partitions are split further and replicated onto new nodes as well, it allows linear scaling iirc. Every node maintains a transaction log on disk, therefore allowing recovery of a node based on this log. However, a node failure does not lead to service interruption, as there always is at least one other node maintaining a backup copy of the partition of the failed node in memory. When failure is detected, the node keeping the backup copy makes his copy primary and sets a new node to keep a backup copy. Also, all transactions are performed atomically via a two-phase commit protocol. MySQL Cluster usually does not imply the usage of another clustering technology on the same nodes, and given the high memory consumption, it's usually not the case to mix things where it isn't needed. One possible scenario would be to have all writes done on the MySQL Cluster, and from it have N frontends set up as Replication Slaves for Read requests. Load balancing writes can be done by having a frontend issue requests to each Data Node, but it's recommended that requests are sent to the DC (IIRC) and it will (based on whom has the writeable copy of a partition) send the request to the Data Node storing it, then that Node will send the reply to the frontend. There are multiple scenarios possible, but they usually involve having writes performed on the MySQL Cluster (it supports simultaneous access for read/write operations on every node via the two-phase commit protocol) and having read requests either on the Cluster or on Replication Slaves with the second option being the recommended one. MySQL Cluster holds all the databases in memory, so it's very fast, has self healing capabilities, built-in high availability, it can also use all of the CPU cores in a system and it relies on network transport for communication between nodes, thus one can upgrade the interconnects to Infiniband or some other solution for maximum performance. In terms of planning for MySQL Cluster, the following links will give some insight: http://www.severalnines.com/sizer/ http://www.fromdual.ch/mysql-cluster-memory-sizing The only use case for DRBD in a MySQL Cluster would be to also replicate the logs that it flushes to disk. In the event of a node failure, the cluster is still fine, as explained above, but restoring a node might take some time (get the log from the failed node, copy it to a new node, load it into RAM, join the node to the cluster, the node updates it's data to match the cluster state || fix the node, restore or not the logs, load them into RAM, etc.). By having the logs on a DRBD partition, and replicating it to another server, the data is still available, if you have a spare server, then it's just a matter of promoting the DRBD partition to Primary, mounting it, then export it via NFS or whatever, mount it on the spare server, start loading the logs into RAM, join the node to the Cluster. It reduces MTTR. Hope it sheds some light onto the picture. Regards, Dan Thanks in Advnace, Nick ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user -- Dan Frincu CCNA, RHCE ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Latency problems with DRBD 0.8.4 and a 3ware RAID5
I sure hope that 0.8.4 is a typo. We are up to version 8.4.0. If you are really running drbd from years ago, I'd suggest an upgrade. Sorry, but I don't have any input on your throughput issue. Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Florian Apolloner Sent: Wednesday, August 31, 2011 7:46 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] Latency problems with DRBD 0.8.4 and a 3ware RAID5 Hi, I am having a horrible latency with the following configuration: * protocol c * raid5 over a 3Ware 9750-4i SAS2 raid controller * Dedicated Gigabit link between the two machines, no switch in between. This is what I can tell you: * iperf shows around 950 Mbits/sec -- sounds okay for gigabit ;) * throughput for the drbd raid has 70-80 MByte/sec -- sounds good to me too. Now on to the latency tests (backing device): dd if=/dev/zero of=/dev/sda3 bs=512 count=1000 oflag=direct 512000 bytes (512 kB) copied, 0.0365779 seconds, 14.0 MB/s and onto the drbd device: dd if=/dev/zero of=/dev/drbd1 bs=512 count=1000 oflag=direct 512000 bytes (512 kB) copied, 9.651 seconds, 53.1 kB/s Sounded awfully slow to me, so I compared with a ramdisk: dd if=/dev/zero of=/dev/drbd2 bs=512 count=1000 oflag=direct 512000 bytes (512 kB) copied, 0.166689 seconds, 3.1 MB/s Sounds wy better, though I can't tell if that's slow or fast enough. The RTT for the link is: rtt min/avg/max/mdev = 0.096/0.166/0.207/0.043 ms So let's summarize: - Harddisk speed shouldn't be a problem (0.0365779 seconds for the - write) Network speed should be a problem as indicated by RTT and - iperf - Network and Harddisk combined - problems. Any hints on how to debug that? I am currently grasping any straw I can get :/ Thx in advance and regards, Florian Apolloner ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] data integrity in drbd
Reading between the lines on this thread, I think you have mixed access paths, and now believe that drbd is somehow involved in your troubles. It is perfectly correct to do what you attempted, IE: Break the mirror, test some process, reestablish the mirror using the original data, and continue on. Assuming the drbd resource r0 as /dev/drbd0, stored on /dev/sdb1 on nodeA (Primary) and nodeB (Secondary): This scenario would be accomplished by: On NodeB: = drbdadm disconnect r0 drbdadm primary r0 mount /somewhere /dev/drbd0 ... do your test on nodeB ... umount /dev/drbd0 drbdadm secondary r0 drbdadm -- --discard-my-data connect r0 and the nodes will sync up to the original data. The main node, nodeA always has the resource as primary, and goes from Connected to WFConnection to SyncSource to Connected. No access to the physical device, /dev/sdb1 is done by any process other than drbd. You never stop drbd on either node. I'm guessing that at some point you mounted /dev/sdb1 instead of /dev/drbd0 and that is the source of your problems; Updates occurred that drbd did not see. Using the full disk rather than a partition (/dev/sdb instead of /dev/sdb1 in this case) could assist in preventing you from shooting yourself in the foot. But, you can ALWAYS shoot yourself in the foot. I do not understand your comment I shut down drbd, try to use rsync to recovery the snapshot, but failed It sounds like there is a Logical Volume Manager involved, or some other underlying block device that supports snapshots. Again, drbd can only replicate changes to a block device that occur while drbd is handling that device. The observations above can be applied to any block device. If you expect drbd to be able to handle the block device, it must be the only access to that device; Connected or not, Primary or Secondary. hth Dan Barker -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Arnold Krille Sent: Friday, August 26, 2011 2:08 PM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] data integrity in drbd On Friday 26 August 2011 19:50:29 you wrote: In my situation, what should I do to completely resync the data of secondary node, including the drbd metadata ? Delete the secondaries disk, create the meta-data anew and make it sync completely from the primary. And if something is still wrong in the files on disk, restore them from the backup... (And don't ask public questions in private:) Have a nice weekend, Arnold ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD won't take 1G syncer rate
You are still mixing megabits and megabytes. Your 1000 megabit pipe won't take a 600 megabyte stream, or a 150 megabyte stream. The maximum is about 125 MBps. DRBD talks (and is documented to talk) bytes. Most everyone else talks bits. You don't mention the speed you are getting. Also, if you have 3 resources syncing, each will try for the syncer limit. So, to use 50% of your capacity to sync 3 resources, you'd specify the rate as 21M. Note: you can change the rate on the fly, during a sync. Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Matt Baer Sent: Friday, August 05, 2011 8:51 AM To: Caspar Smit Cc: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate I was playing with the settings yesterday and it let me set it at 600M and it didn't make a difference in the sync speed at all. I then tried it with your suggestion, dropped it to 150M just to be safe. Still no difference. I wonder what the deal is. Could it be that this is the initial sync? On Aug 5, 2011 1:29 AM, Caspar Smit c.s...@truebit.nl wrote: Hi Matt, 1000M means 1000 Mb/s NOT 1000mbps. To reach 1000M you should have at least one (probably two) 10gbit interface(s). Since you have two 1gbit interfaces (bonded with balance-rr?) a value between 100M and around 170M would be more appropiate. Kind regards, Caspar Op 5 aug. 2011 08:21 schreef Matt Baer mb...@lrnet1.com het volgende: When setting the syncer rate in drbd.conf to 1G, it won't start, citing that 1G is invalid. Get the same thing with 1000M. Any clue as to why? It explicitly states that mb...@lrnet1.com1G is acceptable in the docs. I've triple checked and both interfaces are auto-negotiated at 1000mbps full duplex. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD won't take 1G syncer rate
We've been off-list for a few messages, but this is now interesting enough to be public. I apologize for the top-posting. Please read backwards. Dan Well, to be certain, I'd dd the disks to zeros individually, and then start with them sync'd. drbdadm down all dd if=/dev/zero of=/drbdbackingdevice bs=1M oflag=direct on both sides. New Blank Disk: === #On both nodes, initialize meta data and configure the device. drbdadm -- --force create-md r0 #They need to do the initial handshake, so they know their sizes. drbdadm up r0 #They are now Connected Secondary/Secondary Inconsistent/Inconsistent. Generate a new current-uuid and clear the dirty bitmap. drbdadm -- --clear-bitmap new-current-uuid r0 #They are now Connected Secondary/Secondary UpToDate/UpToDate. drbdadm primary r0 Now, recreate your empty ext3 file system and you are in sync. Dan From: Matt Baer [mailto:mb...@lrnet1.com] Sent: Friday, August 05, 2011 11:26 AM To: Dan Barker Cc: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate Well we're the perfect supplements for each other because, as you can see, I don't know DRBD from a hole in the wall. Yes, it's a brand new resource. I don't think I would say it's full of zeros, as it has a clean ext3 file system on it. Yes, I would LOVE to skip the sync, I've been dealing with this for weeks now and right when I was about to go live, I tested the failover and it didn't work because of a service heartbeat wanted to start wasn't going all that well. While troubleshooting, I lost my 100% perfectly live server and have to start from scratch. Problem is I only have two days to do it and the thing has to sync 1.8TB at 12MB/s. I have no idea where the bottleneck could be. Two servers, a cable connecting eth1 to eth1, both are auto-negotiated at 1gbps on their own /30 subnet. The only thing there would be garbage NIC cards, possible, but not probable, or the cable, more likely, but I've never had an issue with it until now. Freshly constructed servers, too. I tried the drbdsetup /dev/drbd0 syncer -r 120M, been running like that for about 5 minutes now and it hasn't changed at all. On Fri, Aug 5, 2011 at 10:13 AM, Dan Barker dbar...@visioncomm.net wrote: Is this a brand new resource? Why are you doing a full sync? If it's brand new (full of zeros), you can skip the sync. Instructions upon request. Btw, I don't know why you are getting 12% of your requested syncer rate. I'm not a hot-shot linux performace analyzer, but there is a bottleneck somewhere. I get 25M routinely here on GB nics. I have my Syncer set to 25M. It drops to about 14M (each) if 2 are syncing. To change sync rate without stop/start drbd: drbdsetup /dev/drbd1 syncer -r 120M AL Extents seems a bit low. I use 1801 (big prime number that felt about right). Dan From: Matt Baer [mailto:mb...@lrnet1.com] Sent: Friday, August 05, 2011 11:03 AM To: Dan Barker Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate Ok, revised /etc/drbd.conf and restarted DRBD with the following common { syncer { rate 100M; al-extents 257; } } And I'm getting from /proc/drbd: GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008-10-03 11:30:17 0: cs:SyncTarget st:Secondary/Primary ds:Inconsistent/UpToDate C r--- ns:0 nr:281056 dw:272864 dr:0 al:0 bm:16 lo:257 pe:1969 ua:256 ap:0 oos:1308428488 [] sync'ed: 0.1% (1277762/1278028)M finish: 25:57:39 speed: 13,904 (12,400) K/sec And I only have one resource, r0. All it's syncing right now is the post mkfs.ext3 /dev/drbd0 On Fri, Aug 5, 2011 at 9:51 AM, Dan Barker dbar...@visioncomm.net wrote: You are still mixing megabits and megabytes. Your 1000 megabit pipe won't take a 600 megabyte stream, or a 150 megabyte stream. The maximum is about 125 MBps. DRBD talks (and is documented to talk) bytes. Most everyone else talks bits. You don't mention the speed you are getting. Also, if you have 3 resources syncing, each will try for the syncer limit. So, to use 50% of your capacity to sync 3 resources, you'd specify the rate as 21M. Note: you can change the rate on the fly, during a sync. Dan From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Matt Baer Sent: Friday, August 05, 2011 8:51 AM To: Caspar Smit Cc: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate I was playing with the settings yesterday and it let me set it at 600M and it didn't make a difference in the sync speed at all. I then tried it with your suggestion, dropped it to 150M just to be safe. Still no difference. I wonder what the deal is. Could it be that this is the initial sync? On Aug 5, 2011 1:29 AM, Caspar Smit c.s...@truebit.nl wrote: Hi Matt, 1000M means 1000 Mb/s NOT 1000mbps. To reach 1000M you should have at least
Re: [DRBD-user] DRBD won't take 1G syncer rate
OK, You can skip the zero, but the devices won't pass an online verify. It won't hurt anything. All the sectors that need to be synced will be synced. The disks are mostly zeros, so an online verify wouldn't do much, but it's nice to know. To find out how long the DD will take, kill -USR1 taskid (frightening command, but it makes dd tell you how far it's along and does NOT kill it. The 1M blocksize will help a lot. dd defaults to 512). If you do skip the dd, the first verify will identify all the non-zero sectors, and they'll sync up 100's of times faster than a full sync. Good Luck! Dan From: Matt Baer [mailto:mb...@lrnet1.com] Sent: Friday, August 05, 2011 12:10 PM To: Dan Barker Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate Roger that. It's running, but they're 1.8TB a piece so it'll take a while. Just wanted to let you know, no need for it to go to the list. Thanks for the help thus far, it's been difficult to deal with this and I have to get it running ASAP. On Fri, Aug 5, 2011 at 11:06 AM, Dan Barker dbar...@visioncomm.net wrote: That's not the backing device. The backing device is something like /dev/sdb. The drbd device is called device in your config. The backing device is called disk. It should not be mounted. Dan From: Matt Baer [mailto:mb...@lrnet1.com] Sent: Friday, August 05, 2011 11:44 AM To: Dan Barker Cc: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate After I down all, it won't let me issue the dd command citing: dd: opening `/dev/drbd0': Read-only file system On Fri, Aug 5, 2011 at 10:37 AM, Dan Barker dbar...@visioncomm.net wrote: We've been off-list for a few messages, but this is now interesting enough to be public. I apologize for the top-posting. Please read backwards. Dan Well, to be certain, I'd dd the disks to zeros individually, and then start with them sync'd. drbdadm down all dd if=/dev/zero of=/drbdbackingdevice bs=1M oflag=direct on both sides. New Blank Disk: === #On both nodes, initialize meta data and configure the device. drbdadm -- --force create-md r0 #They need to do the initial handshake, so they know their sizes. drbdadm up r0 #They are now Connected Secondary/Secondary Inconsistent/Inconsistent. Generate a new current-uuid and clear the dirty bitmap. drbdadm -- --clear-bitmap new-current-uuid r0 #They are now Connected Secondary/Secondary UpToDate/UpToDate. drbdadm primary r0 Now, recreate your empty ext3 file system and you are in sync. Dan From: Matt Baer [mailto:mb...@lrnet1.com] Sent: Friday, August 05, 2011 11:26 AM To: Dan Barker Cc: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate Well we're the perfect supplements for each other because, as you can see, I don't know DRBD from a hole in the wall. Yes, it's a brand new resource. I don't think I would say it's full of zeros, as it has a clean ext3 file system on it. Yes, I would LOVE to skip the sync, I've been dealing with this for weeks now and right when I was about to go live, I tested the failover and it didn't work because of a service heartbeat wanted to start wasn't going all that well. While troubleshooting, I lost my 100% perfectly live server and have to start from scratch. Problem is I only have two days to do it and the thing has to sync 1.8TB at 12MB/s. I have no idea where the bottleneck could be. Two servers, a cable connecting eth1 to eth1, both are auto-negotiated at 1gbps on their own /30 subnet. The only thing there would be garbage NIC cards, possible, but not probable, or the cable, more likely, but I've never had an issue with it until now. Freshly constructed servers, too. I tried the drbdsetup /dev/drbd0 syncer -r 120M, been running like that for about 5 minutes now and it hasn't changed at all. On Fri, Aug 5, 2011 at 10:13 AM, Dan Barker dbar...@visioncomm.net wrote: Is this a brand new resource? Why are you doing a full sync? If it's brand new (full of zeros), you can skip the sync. Instructions upon request. Btw, I don't know why you are getting 12% of your requested syncer rate. I'm not a hot-shot linux performace analyzer, but there is a bottleneck somewhere. I get 25M routinely here on GB nics. I have my Syncer set to 25M. It drops to about 14M (each) if 2 are syncing. To change sync rate without stop/start drbd: drbdsetup /dev/drbd1 syncer -r 120M AL Extents seems a bit low. I use 1801 (big prime number that felt about right). Dan From: Matt Baer [mailto:mb...@lrnet1.com] Sent: Friday, August 05, 2011 11:03 AM To: Dan Barker Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate Ok, revised /etc/drbd.conf and restarted DRBD with the following common { syncer { rate 100M; al-extents 257; } } And I'm getting from /proc/drbd: GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008-10-03 11:30:17 0
Re: [DRBD-user] DRBD won't take 1G syncer rate
Actually, that makes some sense. If the network is way faster than the sync, and the dd (which doesn't even use the network) is the same speed, then there is something bad wrong with the underlying device or configuration. You are not going to be happy until you get that fixed. Then, the dd should beat the sync by a good bit. You can skip the dd and build your filesystem and use it RIGHT NOW, but there is still something wrong and the performance will probably suck. But, you can get some work done. If fixing the underlying problem doesn't wipe your data, then an online verify and disconnect/connect (of the secondary node) will get you synced up with zero downtime. hth Dan From: Matt Baer [mailto:mb...@lrnet1.com] Sent: Friday, August 05, 2011 12:34 PM To: Dan Barker Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate Ok, looking at this, my guess is that dd and the sync would take roughly the same amount of time. Actually, if I am to believe the output of the kill command you included, using dd will actually take more time. So I just take your previous instructions and omit the dd command to skip it? I don't care if the sync takes forever, I just want to be able to DO something while it's syncing. On Fri, Aug 5, 2011 at 11:15 AM, Dan Barker dbar...@visioncomm.net wrote: OK, You can skip the zero, but the devices won't pass an online verify. It won't hurt anything. All the sectors that need to be synced will be synced. The disks are mostly zeros, so an online verify wouldn't do much, but it's nice to know. To find out how long the DD will take, kill -USR1 taskid (frightening command, but it makes dd tell you how far it's along and does NOT kill it. The 1M blocksize will help a lot. dd defaults to 512). If you do skip the dd, the first verify will identify all the non-zero sectors, and they'll sync up 100's of times faster than a full sync. Good Luck! Dan From: Matt Baer [mailto:mb...@lrnet1.com] Sent: Friday, August 05, 2011 12:10 PM To: Dan Barker Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate Roger that. It's running, but they're 1.8TB a piece so it'll take a while. Just wanted to let you know, no need for it to go to the list. Thanks for the help thus far, it's been difficult to deal with this and I have to get it running ASAP. On Fri, Aug 5, 2011 at 11:06 AM, Dan Barker dbar...@visioncomm.net wrote: That's not the backing device. The backing device is something like /dev/sdb. The drbd device is called device in your config. The backing device is called disk. It should not be mounted. Dan From: Matt Baer [mailto:mb...@lrnet1.com] Sent: Friday, August 05, 2011 11:44 AM To: Dan Barker Cc: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate After I down all, it won't let me issue the dd command citing: dd: opening `/dev/drbd0': Read-only file system On Fri, Aug 5, 2011 at 10:37 AM, Dan Barker dbar...@visioncomm.net wrote: We've been off-list for a few messages, but this is now interesting enough to be public. I apologize for the top-posting. Please read backwards. Dan Well, to be certain, I'd dd the disks to zeros individually, and then start with them sync'd. drbdadm down all dd if=/dev/zero of=/drbdbackingdevice bs=1M oflag=direct on both sides. New Blank Disk: === #On both nodes, initialize meta data and configure the device. drbdadm -- --force create-md r0 #They need to do the initial handshake, so they know their sizes. drbdadm up r0 #They are now Connected Secondary/Secondary Inconsistent/Inconsistent. Generate a new current-uuid and clear the dirty bitmap. drbdadm -- --clear-bitmap new-current-uuid r0 #They are now Connected Secondary/Secondary UpToDate/UpToDate. drbdadm primary r0 Now, recreate your empty ext3 file system and you are in sync. Dan From: Matt Baer [mailto:mb...@lrnet1.com] Sent: Friday, August 05, 2011 11:26 AM To: Dan Barker Cc: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] DRBD won't take 1G syncer rate Well we're the perfect supplements for each other because, as you can see, I don't know DRBD from a hole in the wall. Yes, it's a brand new resource. I don't think I would say it's full of zeros, as it has a clean ext3 file system on it. Yes, I would LOVE to skip the sync, I've been dealing with this for weeks now and right when I was about to go live, I tested the failover and it didn't work because of a service heartbeat wanted to start wasn't going all that well. While troubleshooting, I lost my 100% perfectly live server and have to start from scratch. Problem is I only have two days to do it and the thing has to sync 1.8TB at 12MB/s. I have no idea where the bottleneck could be. Two servers, a cable connecting eth1 to eth1, both are auto-negotiated at 1gbps on their own /30 subnet. The only thing there would be garbage NIC cards, possible, but not probable
Re: [DRBD-user] DRBD on XenServer
You will only see the node resources on the primary node. On the master node, stop all processes using the drbd resource (I'll call it r0) and drbdadm secondary r0. On the slave node, drbdadm primary r0. Now the resource is available on the slave node. In practice, you have to trust a status of Up-To-Date in most situations. Accessing the slave node while secondary would most likely corrupt it (unless it's cluster aware - a different issue all together). drbd prevents that (accessing a secondary resource). hth Dan -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of waynecsh Sent: Thursday, July 28, 2011 12:08 AM To: drbd-user@lists.linbit.com Subject: [DRBD-user] DRBD on XenServer Hi, I'm trying to setup DRBD on citrix Free Xenserver in master/slave mode. I've managed to go to the stage where 'cat /proc/drbd' shows the two nodes are synchronised in primary/secondary mode, and defined a local storage on the primary node using the xe sr-create. However, on the slave xenserver, I cannot see the new storage. Even 'lvs' 'vgs' command do not show anything. Can anyone advice me on how to confirm that the newly created storage is being replicated to the slave node? Do I need to define the storage again on the slave node? Thank you. Regards, Wayne -- View this message in context: http://old.nabble.com/DRBD-on-XenServer-tp32153683p32153683.html Sent from the DRBD - User mailing list archive at Nabble.com. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD 8.0.13-HA 2.1.3 sync after physical move
All you need to do (besides assure the network is working – not “spotty”) is connect the devices. Secondary Up-To-Date will be the sync-target; that’s what secondary means. You don’t need to do any invalidation. The only way to mess up your data would be to set the Zimbra2 node to primary and then mount/access the data. THAT would create a split-brain. As long as Zimbra1 is primary and Zimbra2 is secondary, connecting the nodes successfully will resync in the proper direction. hth Dan “Top Poster” From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Bruce Wolfe Sent: Friday, July 29, 2011 5:50 PM To: drbd-user@lists.linbit.com Subject: [DRBD-user] DRBD 8.0.13-HA 2.1.3 sync after physical move Hi! Could use a little help. I'm a newb to DRDB-HA but understand the concepts and am handy on the CLI. In fact, I found that the Secondary server has been 'disconnected' for over a month. These servers run only Zimbra at the moment running on CentOS 5.5. I hope what I provide below is enough to get the ball rolling. Thank you in advance for any prompt help you may be able to offer today. I inherited this system and after a physical move of the bare metal to a faster connection at data center, drbd-ha seems to not connect and sync. The internal LAN is working as I can ping the ports from each server to the other. But, it is spotty to be able to telnet in. The peer server, Zimbra2, gets it here and there using 192.168.1.2 but the master, Zimbra1, I can never get to work using the standard port, 'telnet 192.168.1.1 7788' Despite that, Zimbra1 is st:Primary/Unknown and Zimbra2 is st:Secondary/Unknown. Since Zimbra2 has been drbd-ha offline for some time, I was advised by other on IRC to discard the data before reconnecting. Logged into Zimbra2 server and after disconnecting Zimbra2 (Secondary) using 'drbdadm disconnect repdata' I performed 'drbdadm -- --discard-my-data connect repdata' but /proc/drbd on Zimbra2 still results in UpToDate. I even tried stopping HA on Zimbra2 but that didn't make a difference either. Any ideas? I want Zimbra1 (Primary) to populate Zimbra2 as it is the working server with the most recent data. Now, /dev/drbd0 exists on Zimbra2 but is not mounted that I can see. In order to --discard-my-data, does the drive need to be mounted? On Zimbra1 it is mounted like this: /dev/drbd0 on /opt type ext3 (rw) On Zimbra2 there is no record of it being mounted. Here is /proc/drbd for Zimbra1: @zimbra1 ~]# cat /proc/drbd version: 8.0.13 (api:86/proto:86) GIT-hash: ee3ad77563d2e87171a3da17cc002ddfd1677dbe build by buildsvn@c5-x8664-build, 2008-10-03 10:12:56 0: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown r--- ns:0 nr:0 dw:1505899068 dr:1246861951 al:3145701 bm:3145590 lo:0 pe:0 ua:0 ap:0 resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0 act_log: used:0/127 hits:353787267 misses:3888262 starving:1706 dirty:741251 changed:3145701 Here is /proc/drbd for Zimbra2: @zimbra2 ~]# cat /proc/drbd version: 8.0.13 (api:86/proto:86) GIT-hash: ee3ad77563d2e87171a3da17cc002ddfd1677dbe build by buildsvn@c5-x8664-build, 2008-10-03 10:12:56 0: cs:StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown r--- ns:0 nr:0 dw:912 dr:41418 al:18 bm:62 lo:0 pe:0 ua:0 ap:0 resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0 act_log: used:0/127 hits:210 misses:18 starving:0 dirty:0 changed:18 Also, here is /etc/drbd.conf if that helps. @zimbra2 ~]# cat /etc/drbd.conf global { usage-count no; } resource repdata { protocol C; startup { wfc-timeout 0; degr-wfc-timeout 120; } disk { on-io-error detach; } syncer { rate 25M; } on zimbra1.marininstitute.org { device /dev/drbd0; disk /dev/VolGroup01/LogVol01; address 192.168.1.1:7788; meta-disk internal; } on zimbra2.marininstitute.org { device /dev/drbd0; disk /dev/VolGroup00/LogVol00; address 192.168.1.2:7788; meta-disk internal; } } Thank you in advance for any prompt help you may be able to offer today. Bruce M. Wolfe, M.S.W., CIO Description: Image removed by sender. 24 Belvedere St. San Rafael, CA 94901 415/456.5692 x213 main 415/257.2493 office 415/456.0491 fax KI6BSL ham He that falls in love with himself will have no rivals. - Benjamin Franklin ~WRD000.jpg___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user