from:"Igor Cicimov"

Re: [DRBD-user] drbd90 unexpected split-brain, possible uuid_compare issue

2020-10-19 Thread Igor Cicimov

On Tue, Oct 20, 2020 at 4:17 AM Jeremy Faith  wrote:

> Hi,
>
> drbd90 kernel module version:9.0.22-2(also 9.0.25-1 compiled from 
> source)drbd90-utils:9.12.2-1
> kernel:3.10.0-1127.18.2.el7.x86_64
>
> 4 nodes
> n1 primary
> n2,n3,n4 all secondary
>
> If I run the folowing script then sometimes, after the starts, some of
> the nodes get stuck in Outdated or Inconsistent state forever. The loop
> generally works correctly several times(max was about 14) before getting
> stuck.
> I have NOT been able to replicate the spilt-brain state in this way but I
> think it is related.
>
> while true
> do
>   ssh n4 'service corosync stop'
>   ssh n3 'service corosync stop'
>   ssh n2 'service corosync stop'
>   ssh n1 'service corosync stop'
>   sleep 5
>   ssh n1 'service pacemaker start'
>   ssh n2 'service pacemaker start'
>   ssh n3 'service pacemaker start'
>   ssh n4 'service pacemaker start'
>   #At this point r0 resource is mounted on /home
>   #and processes are writing to it on n1
>   while true
>   do
> sleep 1
> echo "events2 `date`"
> drbdsetup events2 --now  -c
> num_u2d=`drbdsetup events2 --now|grep -c disk:UpToDate`
> echo "num UpToDate=$num_u2d"
> [ "$num_u2d" = 4 ]&&break
> sleep 4
>   done
> done
>
> If I change the stop order to
>   n1,n4,n3,n2
> and the start order to
>   n2,n3,n4,n1
> There is never a problem, I left this running over the weekend and it
> worked over 5 thousand times.
>
> I replicated the same issue using drbdadm commands directly so this is not
> a corosync/pacemaker issue,
> i.e. running following script on n1 also produces the problem:-
> while true
> do
>   for n in 4 3 2
>   do
> ssh n$n 'drbdadm down r0'
>   done
>   #stop process that write to /home
>   a2ksys stop >> a2ksys_stop.log 2>&1
>   while [ -d /home/cem ]
>   do
> umount /home
> [ ! -d /home/cem ]&&break
> echo lsof output
> lsof /home
> sleep 5
>   done
>   drbdadm secondary r0
>   drbdadm down r0
>   #END OF STOP
>
>   drbdadm up r0
>   while true
>   do
> drbdadm primary r0&&break
> echo "sleep 5 before retry primary"
> sleep 5
>   done
>   while true
>   do
> mount -orw /dev/drbd0 /home&&break
> echo "sleep 5 before retry mount"
> sleep 5
>   done
>   #start processes that do some writes HERE
>   a2ksys start >> a2ksys_start.log 2>&1
>   ssh n2 'drbdadm up r0'
>   ssh n3 'drbdadm up r0'
>   ssh n4 'drbdadm up r0'
>   while true
>   do
> sleep 1
> echo "events2 `date`"
> drbdsetup events2 --now  -c
> num_u2d=`drbdsetup events2 --now|grep -c disk:UpToDate`
> echo "num UpToDate=$num_u2d"
> [ "$num_u2d" = 4 ]&&break
> sleep 4
>   done
> done
>
> r0.res:-
> resource r0 {
>   handlers {
> fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
> unfence-peer "/usr/lib/drbd/crm-unfence-peer.9.sh";
> split-brain "/bin/touch /tmp/drbd_split_brain.flg";
>   }
>   startup {
> wfc-timeout0;  ## Infinite!
> degr-wfc-timeout 120;  ## 2 minutes.
> outdated-wfc-timeout 120;
>   }
>   disk {
> resync-rate 100M;
> on-io-error detach;
> disable-write-same;
>   }
>   net {
> protocol C;
>   }
>
>   device /dev/drbd0;
>   disk   /dev/VolGroup00/lv_home;
>   meta-disk  /dev/VolGroup00/lv_drbd_meta [0];
>
>   on n1 {
> address 192.168.52.151:7789;
> node-id 1;
>   }
>   on n2 {
> address 192.168.52.152:7789;
> node-id 2;
>   }
>   on n3 {
> address 192.168.53.151:7789;
> node-id 3;
>   }
>   on n4 {
> address 192.168.53.152:7789;
> node-id 4;
>   }
>
>   connection-mesh {
> hosts n1 n2 n3 n4;
>   }
> }
>
> Is there any more information I can provide that would help to track down
> this issue?
>
> Regards,
> Jeremy Faith
> ___
> Star us on GITHUB: https://github.com/LINBIT
> drbd-user mailing list
> drbd-user@lists.linbit.com
> https://lists.linbit.com/mailman/listinfo/drbd-user
>


You should definitely stop pacemaker before corosync and start in the
opposite order.

-- 








Know Your Customer due diligence on demand, powered by intelligent 
process automation




Blogs   
|  LinkedIn   |  
Twitter 

 




Encompass Corporation UK 
Ltd  |  Company No. SC493055  |  Address: Level 3, 33 Bothwell Street, 
Glasgow, UK, G2 6NL

Encompass Corporation Pty Ltd  |  ACN 140 556 896  |  
Address: Level 10, 117 Clarence Street, Sydney, New South Wales, 2000

This 
email and any attachments is intended only for the use of the individual or 
entity named above and may contain confidential information. 

If you are 
not the intended recipient, any dissemination, distribution or copying of 
this email is prohibited. 

If received in error, please notify us 
immediately by return email and destroy the original message.








___

Re: [DRBD-user] LINSTOR on PROXMOX : How to deploy ressources on a new satellite?

2019-02-03 Thread Igor Cicimov

Yannis,

On Sun, Feb 3, 2019 at 9:39 PM Yannis Milios 
wrote:

>
> Are you saying this needs to be done for every single resource potentially
>> hundreds of vm's with multiple disks attached? This sounds like a huge pita.
>>
>
> Yes.  However, I did a test. I temporarily reduced redundancy level in
> /etc/pve/storage.cfg and the created a new VM in PVE.
> Then I added the resource to the additional node by using 'linstor
> resource create' command. Finally I checked the properties of the resource
> and I noticed that the two important keys, 'PeerSlots' and 'StorPoolName'
> were automatically added to the newly added resource, so I would assume
> that this is not an issue anymore...
>
> Yannis
>
Thanks for your input on this, very useful info!
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] LINSTOR on PROXMOX : How to deploy ressources on a new satellite?

2019-02-02 Thread Igor Cicimov

On Sun, 3 Feb 2019 3:42 am Yannis Milios  You have to specify which storage pool to use for the resource, otherwise
> it will default to 'DfltStorPool', which does not exist. So that would be
> something like this...
>
> $ linstor resource create pve3 vm-400-disk-1 --storage-pool 
>
>
> It might be also wise to check and compare the settings of the resource on
> an existing node, and then add any missing entries for the resource on the
> new node. For example ..
>
> $ linstor rd lp pve2 vm-400-disk-1#This will show the settings for
>> vm-400-disk-1 on node pve2
>
>
>
>> $ linstor rd lp pve3 vm-400-disk-1#This will show the settings for
>> vm-400-disk-1 on node pve3
>
>
> Compare settings, and if needed add any missing entries. This is only
> needed for the existing resources.
>

Are you saying this needs to be done for every single resource potentially
hundreds of vm's with multiple disks attached? This sounds like a huge pita.

Any new resources (VMs) you create on Proxmox will be automatically created
> with the correct settings (assuming that you have increased redundancy from
> 2 to 3 in /etc/pve/storage.cfg).
>
> P.S
> I strongly recommend using 'linstor interactive' mode to familiarise
> yourself to linstor command line parameters.
>
> Yannis
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD online resize drbd dual-primary on Pacemaker

2018-12-14 Thread Igor Cicimov

Nevermind decided not to be lazy (saturday and all that) and do it
properly. Done now.

On Sat, 15 Dec 2018 12:40 pm Igor Cicimov  Hi,
>
> According to https://docs.linbit.com/docs/users-guide-8.4/#s-resizing,
> when resizing DRBD 8.4 device online one side of the mirror needs to
> be Secondary. I have dual primary setup with Pacemaker and GFS2 as
> file system and wonder if I need to demote one side to Secondary
> before I run:
>
> drbdadm -- --assume-clean resize 
>
> or it will still work while both sides are Primary? The resource has
> internal metadata.
>
> Thanks
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] DRBD online resize drbd dual-primary on Pacemaker

2018-12-14 Thread Igor Cicimov

Hi,

According to https://docs.linbit.com/docs/users-guide-8.4/#s-resizing,
when resizing DRBD 8.4 device online one side of the mirror needs to
be Secondary. I have dual primary setup with Pacemaker and GFS2 as
file system and wonder if I need to demote one side to Secondary
before I run:

drbdadm -- --assume-clean resize 

or it will still work while both sides are Primary? The resource has
internal metadata.

Thanks
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] umount /drbdpart takes >50 seconds

2018-12-13 Thread Igor Cicimov

On Fri, Dec 14, 2018 at 2:57 AM Lars Ellenberg 
wrote:

> On Wed, Dec 12, 2018 at 10:16:09AM +0100, Harald Dunkel wrote:
> > Hi folks,
> >
> > using drbd umounting /data1 takes >50 seconds, even though the file
> > system (ext4, noatime, default) wasn't accessed for more than 2h.
> > umount ran with 100% CPU load.
> >
> > # sync
> > # time umount /data1
> >
> > real0m52.772s
> > user0m0.000s
> > sys 0m52.740s
> >
> >
> > This appears to be a pretty long time. I am concerned that there
> > is some data sleeping in a buffer that gets flushed only at umount
> > time.
> >
> > Kernel is version 4.18.0-0.bpo.1-amd64 on Debian Stretch. drbdutils
> > is 8.9.10-2. drbd.conf is attached. The bond2 interface used for
> > drbd synchronization is based upon 2 * 10 Gbit/sec NICs.
> >
> > Every insightful comment is highly appreciated.
>
> Unlikely to have anything to do with DRBD.
>
> since you apparently can reproduce, monitor
> grep -e Dirty -e Writeback /proc/meminfo
> and slabtop before/during/after umount.
>
> Also check sysctl settings
> sysctl vm | grep dirty
>

Good point, people running servers with huge amount of ram should
understand there is also a huge amount of cache that needs to get flushed
to the device before it gets removed.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD 9 Primary-Secondary, Pacemaker, and STONITH

2018-11-13 Thread Igor Cicimov

On Wed, Nov 14, 2018 at 8:31 AM Bryan K. Walton
 wrote:
>
> I have a two-node DRBD 9 resource configured in Primary-Secondary mode
> with automatic failover configured with Pacemaker.
>
> I know that I need to configure STONITH in Pacemaker and then set DRBD's
> fencing to "resource-and-stonith".
>
> The nodes are Supermicro servers with IPMI.  I'm planning to use IPMI
> for my (first) fencing level.
>
> Where I'm confused is regarding whether I must have a second fencing
> level beyond IPMI?  Or will DRBD's fencing configuration, combined with
> IPMI be good enough?
>
> Looking at:
> http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>
> It reads:
>
> "A common mistake people make when choosing a STONITH device
> is to use a remote power switch (such as many on-board IPMI controllers)
> that shares power with the node it controls. If the power fails in such
> a case, the cluster cannot be sure whether the node is really offline,
> or active and suffering from a network fault, so the cluster will stop
> all resources to avoid a possible split-brain situation."
>
> I don't understand this.  If the power fails to a node, then won't the
> node, by definition be down (since there is no power going to the node)?
> So, how then could there be a split brain when one node has no power?

And how is the other node suppose to know that it's peer crashed due
to power failure? From it's stand point the node disappeared and it
has to attempt to STONITH. Now, if the STONITH device on the other
side gets it's power via the same supply as the server then STONITH is
not possible because the stonith device itself will be powered down
too. You see the problem now?

>
>
> Is the above quote stating that if Pacemaker can't confirm that one
> node has been STONITHed, that it won't allow the remaining node to work,
> either?
>
> Thanks!
> Bryan Walton
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] linstor-proxmox controller toggle tests

2018-11-13 Thread Igor Cicimov

On Tue, Nov 13, 2018 at 3:37 AM Yannis Milios 
wrote:

> As far as I know, Proxmox does not need 3 nodes and/or a quorum, and the
>> LINSTOR controller does not care either.
>>
>
> Thanks for confirming this Robert.
> In my experience, Promox requires a minimum 3 nodes when HA is
> enabled/required. When HA is enabled, and one of the 2 cluster nodes goes
> down, then HA will not function properly, due to not enough remaining votes
> for a working,healthy cluster.
>

Not if you set:

quorum {
  provider: corosync_votequorum
  two_node: 1
...
}

in the corosync config file.

> Since LINSTOR controller functionality in Proxmox is based on HA (VM),
> that will (indirectly) affect also LINSTOR availability, as the HA VM used
> to host it, will stop functioning as soon as Proxmox cluster (corosync)
> looses the quorum majority, hence my recommendation of 3 nodes.
>
> Yannis
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbd+lvm no bueno

2018-07-28 Thread Igor Cicimov

On Sun, 29 Jul 2018 1:13 am Eric Robinson  wrote:

>
>
>
>
> > -Original Message-
> > From: Eric Robinson
> > Sent: Saturday, July 28, 2018 7:39 AM
> > To: Lars Ellenberg ;
> drbd-user@lists.linbit.com
> > Subject: RE: [DRBD-user] drbd+lvm no bueno
> >
> > > > > Lars,
> > > > >
> > > > > I put MySQL databases on the drbd volume. To back them up, I pause
> > > > > them and do LVM snapshots (then rsync the snapshots to an archive
> > > > > server). How could I do that with LVM below drbd, since what I
> > > > > want is a snapshot of the filesystem where MySQL lives?
> > > >
> > > > You just snapshot below DRBD, after "quiescen" the mysql db.
> > > >
> > > > DRBD is transparent, the "garbage" (to the filesystem) of the
> > > > "trailing drbd meta data" is of no concern.
> > > > You may have to "mount -t ext4" (or xfs or whatever), if your mount
> > > > and libblkid decide that this was a "drbd" type and could not be
> > > > mounted. They are just trying to help, really.
> > > > which is good. but in that case they get it wrong.
> > >
> > > Okay, just so I understand
> > >
> > > Suppose I turn md4 into a PV and create one volume group
> > > 'vg_under_drbd0', and logical volume 'lv_under_drbd0' that takes 95%
> > > of the space, leaving 5% for snapshots.
> > >
> > > Then I create my ext4 filesystem directly on drbd0.
> > >
> > > At backup time, I quiesce the MySQL instances and create a snapshot of
> > > the drbd disk.
> > >
> > > I can then mount the drbd snapshot as a filesystem?
> > >
> >
> > Disregard question. I tested it. Works fine. Mind blown.
> >
> > -Eric
> >
>
> Although I discovered quite by accident that you can mount a snapshot over
> the top of the filesystem that exists on the device that it's a snapshot
> of. Wouldn't this create some sort of recursive write death spiral?
>

I think this can help properly understanding lvm snapshots and answer your
question https://www.clevernetsystems.com/lvm-snapshots-explained/


> Check it out...
>
> root@001db01a /]# lvdisplay
>   --- Logical volume ---
>   LV Path/dev/vg_under_drbd1/lv_under_drbd1
>   LV Namelv_under_drbd1
>   VG Namevg_under_drbd1
>   LV UUIDLWWPiL-Y6nR-cNnW-j2E9-LAK9-UsXm-3inTyJ
>   LV Write Accessread/write
>   LV Creation host, time 001db01a, 2018-07-28 04:53:14 +
>   LV Status  available
>   # open 2
>   LV Size1.40 TiB
>   Current LE 367002
>   Segments   1
>   Allocation inherit
>   Read ahead sectors auto
>   - currently set to 8192
>   Block device   253:1
>
>   --- Logical volume ---
>   LV Path/dev/vg_under_drbd0/lv_under_drbd0
>   LV Namelv_under_drbd0
>   VG Namevg_under_drbd0
>   LV UUIDM2oMNd-hots-d9Pf-KQG8-YPqh-6x3a-r6wBqo
>   LV Write Accessread/write
>   LV Creation host, time 001db01a, 2018-07-28 04:52:59 +
>   LV Status  available
>   # open 2
>   LV Size1.40 TiB
>   Current LE 367002
>   Segments   1
>   Allocation inherit
>   Read ahead sectors auto
>   - currently set to 8192
>   Block device   253:0
>
> [root@001db01a /]# df -h
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/sda230G  3.3G   27G  12% /
> devtmpfs 63G 0   63G   0% /dev
> tmpfs63G 0   63G   0% /dev/shm
> tmpfs63G  9.0M   63G   1% /run
> tmpfs63G 0   63G   0% /sys/fs/cgroup
> /dev/sda1   497M   78M  420M  16% /boot
> /dev/sdb1   252G   61M  239G   1% /mnt/resource
> tmpfs13G 0   13G   0% /run/user/0
> /dev/drbd0  1.4T  2.1G  1.4T   1% /ha01_mysql
> [root@001db01a /]#
> [root@001db01a /]# ls /ha01_mysql
> lost+found  testfile
> [root@001db01a /]#
> [root@001db01a /]# lvcreate -s -L30G -n drbd0_snapshot
> /dev/vg_under_drbd0/lv_under_drbd0
>   Logical volume "drbd0_snapshot" created.
> [root@001db01a /]#
> [root@001db01a /]# mount /dev/vg_under_drbd0/drbd0_snapshot /ha01_mysql
> [root@001db01a /]#
> [root@001db01a /]# cd /ha01_mysql
> [root@001db01a ha01_mysql]# ls
> lost+found  testfile
> [root@001db01a ha01_mysql]# echo blah > blah.txt
> [root@001db01a ha01_mysql]# ll
> total 2097172
> -rw-r--r--. 1 root root  5 Jul 28 14:50 blah.txt
> drwx--. 2 root root  16384 Jul 28 14:10 lost+found
> -rw-r--r--. 1 root root 2147479552 Jul 28 14:20 testfile
> [root@001db01a ha01_mysql]# cd /
>  [root@001db01a /]# umount /ha01_mysql
> [root@001db01a /]# ls /ha01_mysql
> lost+found  testfile
> [root@001db01a /]#
>
> What? I know nothing.
>
> --Eric
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
ht

Re: [DRBD-user] Content of DRBD volume is invalid during sync after disk replace

2018-07-26 Thread Igor Cicimov

Hi,

On Fri, Jul 27, 2018 at 1:36 AM, Lars Ellenberg 
wrote:

> On Mon, Jul 23, 2018 at 02:46:25PM +0200, Michal Michaláč wrote:
> >  Hello,
> >
> >
> >
> > after replacing backing device of DRBD, content of DRBD volume (not only
> > backing disk) is invalid on node with inconsistent backing device, until
> > sync finishes. I think, correct behaviour is to read data from peer's
> > (consistent) backing device if process running on node with inconsistent
> > backing device wants to read unsynchronized part of DRBD volume.
>
> ...
>
> > If I  skip create-md (step 4), situation is even worse - after attach
> disk,
> > DRBD says volume is sychronized(!):
> >
> > log: Began resync as SyncTarget (will sync 0 KB [0 bits set])
> >
> > but after verification (drbdadm verify test), there are many out-of-sync
> > sectors.
> >
> > After disconnect/connect volume test, resync not started(!):
> >
> > log: No resync, but 3840 bits in bitmap!
> >
> > If I (on new DRBD volume) just disconnect -> write changes to primary ->
> > connect, sync works correctly.
>
> > Versions (on both nodes are identical):
> > # cat /proc/drbd
> > version: 9.0.14-1 (api:2/proto:86-113)
> > GIT-hash: 62f906cf44ef02a30ce0c148fec223b40c51c533 build by root@n2,
> > 2018-07-12 13:18:02
> >
> > Transports (api:16): tcp (9.0.14-1)
> >
> > # uname -a
> > Linux n2 4.15.17-1-pve #1 SMP PVE 4.15.17-9 (Wed, 9 May 2018 13:31:43
> +0200)
> > x86_64 GNU/Linux
> >
> > # lvm version
> >   LVM version: 2.02.168(2) (2016-11-30)
> >   Library version: 1.02.137 (2016-11-30)
> >   Driver version:  4.37.0
>
> > Is it bug or am I doing something wrong?
>
> Thanks for the detailed and useful report,
> definetely a serious and embarassing bug,
> now already fixed internally.
> Fix will go into 9.0.15 final.
> We are in the progress of making sure
> we have covered all variants and lose ends of this.
>

Is this going to get back ported to 8.4 as well?
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbd+lvm no bueno

2018-07-26 Thread Igor Cicimov

On Fri, Jul 27, 2018 at 3:51 AM, Eric Robinson 
wrote:

> > On Thu, Jul 26, 2018 at 04:32:17PM +0200, Veit Wahlich wrote:
> > > Hi Eric,
> > >
> > > Am Donnerstag, den 26.07.2018, 13:56 + schrieb Eric Robinson:
> > > > Would there really be a PV signature on the backing device? I didn't
> > > > turn md4 into a PV (did not run pvcreate /dev/md4), but I did turn
> > > > the drbd disk into one (pvcreate /dev/drbd1).
> >
> > Yes (please view in fixed with font):
> >
> > | PV signature | VG extent pool |
> > | drbd1     | drbd metadata |  md4  
> > |     ...|md metadata|
> > |component|drives|.|.|...of...|md4..|.|.|
> >
> > > both DRBD and mdraid put their metadata at the end of the block
> > > device, thus depending on LVM configuration, both mdraid backing
> > > devices as well as DRBD minors bcking VM disks with direct-on-disk PVs
> > > might be detected as PVs.
> > >
> > > It is very advisable to set lvm.conf's global_filter to allow only the
> > > desired devices as PVs by matching a strict regexp, and to ignore all
> > > other devices, e.g.:
> > >
> > >  global_filter = [ "a|^/dev/md.*$|", "r/.*/" ]
> > >
> > > or even more strict:
> > >
> > >  global_filter = [ "a|^/dev/md4$|", "r/.*/" ]
> >
> > Uhm, no.
> > Not if he want DRBD to be his PV...
> > then he needs to exclude (reject) the backend, and only include (accept)
> the
> > DRBD.
> >
> > But yes, I very much recommend to put an explicit white list of the
> to-be-
> > used PVs into the global filter, and reject anything else.
> >
> > Note that these are (by default unanchored) regexes, NOT glob patterns.
> > (Above examples get that one right, though r/./ would be enough...
> > but I have seen people get it wrong too many times, so I thought I'd
> mention
> > it here again)
> >
> > > After editing the configuration, you might want to regenerate your
> > > distro's initrd/initramfs to reflect the changes directly at startup.
> >
> > Yes, don't forget that step ^^^ that one is important as well.
> >
> > But really, most of the time, you really want LVM *below* DRBD, and NOT
> > above it. Even though it may "appear" to be convenient, it is usually
> not what
> > you want, for various reasons, one of it being performance.
>
> Lars,
>
> I put MySQL databases on the drbd volume. To back them up, I pause them
> and do LVM snapshots (then rsync the snapshots to an archive server). How
> could I do that with LVM below drbd, since what I want is a snapshot of the
> filesystem where MySQL lives?
>
> How severely does putting LVM on top of drbd affect performance?
>
> >
> > Cheers,
> >
> > --
> > : Lars Ellenberg


It depends I would say it is not unusual to end up with a setup where dbrd
is sandwiched between top and bottom lvm due to requirements or
convenience. For example in case of master-master with GFS2:

iscsi,raid -> lvm -> drbd -> clvm -> gfs2

Apart from the clustered lvm on top of drbd (which is RedHat recommended)
you also get the benefit of easily extending the drbd device(s) due to the
underlying lvm.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Pacemaker unable to start DRBD

2018-07-24 Thread Igor Cicimov

Hi Jaco,

On Mon, Jul 23, 2018 at 11:10 PM, Jaco van Niekerk 
wrote:

> Hi
> I am using the following packages:
> pcs-0.9.162-5.el7.centos.1.x86_64
> kmod-drbd84-8.4.11-1.el7_5.elrepo.x86_64
> drbd84-utils-9.3.1-1.el7.elrepo.x86_64
> pacemaker-1.1.18-11.el7_5.3.x86_64
> corosync-2.4.3-2.el7_5.1.x86_64
> targetcli-2.1.fb46-6.el7_5.noarch
>
> my /etc/drbd.conf
> global { usage-count no; }
> common { protocol C; }
> resource imagesdata {
>  on node1.san.localhost {
>   device /dev/drbd0;
>   disk   /dev/vg_drbd/lv_drbd;
>   address  192.168.0.2:7789;
>   meta-disk internal;
>  }
>  on node2.san.localhost {
>   device /dev/drbd0;
>   disk   /dev/vg_drbd/lv_drbd;
>   address  192.168.0.3:7789;
>   meta-disk internal;
>  }
> }
>
> my /etc/corosync/corosync.conf:
> totem {
> version: 2
> secauth: off
> cluster_name: san_cluster
> transport: udpu
> interface {
> ringnumber: 0
> bindnetaddr: 192.168.0.0
> broadcast: yes
> mcastport: 5405
> }
> }
>
> nodelist {
> node {
> ring0_addr: node1.san.localhost
> name: node1
> nodeid: 1
> }
>
> node {
> ring0_addr: node2.san.localhost
> name: node2
> nodeid: 2
> }
> }
>
> quorum {
> provider: corosync_votequorum
> two_node: 1
> wait_for_all: 1
> last_man_standing: 1
> auto_tie_breaker: 0
> }
>
> logging {
> to_logfile: yes
> logfile: /var/log/cluster/corosync.log
> to_syslog: yes
> }
>
> Pacemaker setup:
> pcs cluster auth node1.san.localdomain node2.san.localdomain -u hacluster
> -p PASSWORD
> pcs cluster setup --name san_cluster node1.san.localdomain
> node2.san.localdomain
> pcs cluster start --all
> pcs cluster enable --all
> pcs property set stonith-enabled=false
> pcs property set no-quorum-policy=ignore
>
> The following command doesn't work:
> pcs resource create my_iscsidata ocf:linbit:drbd
> *drbd_resource=iscsidata* op monitor interval=10s
> pcs resource master MyISCSIClone my_iscsidata master-max=1
> master-node-max=1 clone-max=2 clone-node-max=1 notify=true
>
> I receive the following on pcs status:
> * my_iscsidata_monitor_0 on node2.san.localhost 'not configured' (6):
> call=9, status=complete, exitreason='meta parameter misconfigured, expected
> clone-max -le 2, but found unset.',
>
>
I guess it's a typo. See the highlighted part above, *drbd_resource* has
to match your DRBD resource you created which in your case would be
*imagesdata*
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Pacemaker could not switch drbd nodes

2018-03-26 Thread Igor Cicimov

Hi,

On Fri, Mar 23, 2018 at 9:01 AM, Lozenkov Sergei 
wrote:

> Hello.
> I have two Debian 9 servers with configured  Corosync-Pacemaker-DRBD. All
> work well for month.
> After some servers issues (with reboots) I have situation that pacemaker
> could not switch drbd node with such errors:
>
> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com   lrmd:   notice:
> operation_finished: drbd_nfs_stop_0:3667:stderr [ 1: State change
> failed: (-12) Device is held open by someone ]
>
> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com   lrmd:   notice:
> operation_finished: drbd_nfs_stop_0:3667:stderr [ Command 'drbdsetup-84
> secondary 1' terminated with exit code 11 ]
>
> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com   lrmd: info:
> log_finished:   finished - rsc:drbd_nfs action:stop call_id:47 pid:3667
> exit-code:1 exec-time:20002ms queue-time:0ms
>
> Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com   crmd:error:
> process_lrm_event:  Result of stop operation for drbd_nfs on
> nfs01-az-eus.tech-corps.com: Timed Out | call=47 key=drbd_nfs_stop_0
> timeout=2ms
>
> Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com   crmd:   notice:
> process_lrm_event:  nfs01-az-eus.tech-corps.com-drbd_nfs_stop_0:47 [
> 1: State change failed: (-12) Device is held open by someone\nCommand
> 'drbdsetup-84 secondary 1' terminated with exit code 11\n1: State change
> failed: (-12) Device is held open by someone\nCommand 'drbdsetup-84
> secondary 1' terminated with exit code 11\n1: State change failed: (-12)
> Device is held open by someone\nCommand 'drbdsetup-84 secondary 1'
> terminated with exit
>
> I tried to resolve the issue with many googled receipts but all attempts
> were unsuccessful.
> As well I have another two node cluster with exactly the same
> configuration and it works without any issues.
>
> Right now I placed nodes to standby mode and manually raised all services.
> Please, could You help me to analyze and solve the problem?
> Thanks
>
> Here are my configuration files:
> --- CRM CONFIG ---
> crm configure show
> node 171049224: nfs01-az-eus.tech-corps.com \
> attributes standby=off
> node 171049225: nfs02-az-eus.tech-corps.com \
> attributes standby=on
> primitive drbd_nfs ocf:linbit:drbd \
> params drbd_resource=nfs \
> op monitor interval=29s role=Master \
> op monitor interval=31s role=Slave
> primitive fs_nfs Filesystem \
> params device="/dev/drbd1" directory="/data" fstype=ext4 \
> meta is-managed=true
> primitive nfs lsb:nfs-kernel-server \
> op monitor interval=5s
> primitive nmbd lsb:nmbd \
> op monitor interval=5s
> primitive smbd lsb:smbd \
> op monitor interval=5s
> group NFS fs_nfs nfs nmbd smbd
> ms ms_drbd_nfs drbd_nfs \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
> order fs-nfs-before-nfs inf: fs_nfs:start nfs:start
> order fs-nfs-before-nmbd inf: fs_nfs:start nmbd:start
> order fs-nfs-before-smbd inf: fs_nfs:start smbd:start
> order ms-drbd-nfs-before-fs-nfs inf: ms_drbd_nfs:promote fs_nfs:start
> colocation ms-drbd-nfs-with-ha inf: ms_drbd_nfs:Master NFS
> order nmbd-before-smbd inf: nmbd:start smbd:start
> property cib-bootstrap-options: \
> have-watchdog=false \
> dc-version=1.1.16-94ff4df \
> cluster-infrastructure=corosync \
> cluster-name=debian \
> stonith-enabled=false \
> no-quorum-policy=ignore
>
>
>
> --- DRBD GLOBAL ---
> cat /etc/drbd.d/global_common.conf | grep -v '#'
>
> global {
> usage-count no;
> }
>
> common {
> protocol C;
>
> handlers {
>
> }
>
> startup {
> }
>
> options {
> }
>
> disk {
> }
>
> net {
> }
> }
>
>
> --- DRBD -RESOURCE ---
> cat /etc/drbd.d/nfs.res | grep -v '#'
> resource nfs{
>   meta-disk internal;
>   device /dev/drbd1;
>   syncer {
> verify-alg sha1;
> rate 100M;
>   }
>
>   net{
> max-buffers 8000;
> max-epoch-size 8000;
> unplug-watermark 16;
> sndbuf-size 0;
>   }
>
>   disk{
> disk-barrier no;
> disk-flushes no;
>   }
>
>   on nfs01-az-eus.tech-corps.com{
> disk /dev/sdc1;
> address 10.50.1.8:7789;
>   }
>
>   on nfs02-az-eus.tech-corps.com{
> disk /dev/sdc1;
> address 10.50.1.9:7789;
>   }
> }
>
>
>
>
> --
> Segey L
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
Did you check with fuser what is holding the device/filesystem busy?
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] ZFS storage backend failed

2018-02-20 Thread Igor Cicimov

On Tue, Feb 20, 2018 at 9:55 PM, Julien Escario  wrote:

> Le 10/02/2018 à 04:39, Igor Cicimov a écrit :
> > Did you tell it
> > to? https://docs.linbit.com/doc/users-guide-84/s-
> configure-io-error-behavior/
>
> Sorry for the late answer : I moved on performance tests with a ZFS RAID1
> backend. I'll retry backend failure a little later.
>
> But ... as far as I understand, 'detach' behavior should be the default no
> ?
>

I think the default is/was for DRBD to "pass-on" the error to the higher
layer that should decide it self how to handle it.


> My tought is that DRBD wasn't notified or didn't detect the blocked IOs on
> the
> backend. Perhaps a specific bahevior of ZFS.
>
> More tests to come.
>
> Best regards,
> Julien Escario
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] ZFS storage backend failed

2018-02-09 Thread Igor Cicimov

On 10 Feb 2018 5:02 am, "Julien Escario"  wrote:

Hello,
I'm just doing a lab about zpool as storage backend for DRBD (storing VM
images
with Proxmox).

Right now, it's pretty good once tuned and I've been able to achieve 500MB/s
write speed with just a little curiosity about concurrent write from both
hypervisors cluster but that's not the point here.

To complete resiliancy tests, I simplify unplugged a disk from a node. My
toughs
was DRBD was just going to detect ZFS failure and detach the ressources from
failed device.


Did you tell it to?
https://docs.linbit.com/doc/users-guide-84/s-configure-io-error-behavior/


But ... nothing. I/O just hangs on VMs ran on the 'failed' node.

My zpool status :

NAMESTATE READ WRITE CKSUM
drbdpoolUNAVAIL  0 0 0  insufficient replicas
  sda   UNAVAIL  0 0 0

but drbdadm show this for locally hosted VM (on the failed node) :
vm-101-disk-1 role:Primary
  disk:UpToDate
  hyper-test-02 role:Secondary
peer-disk:UpToDate

and remote VM (on the 'sane' node from failed node point of view) :
vm-104-disk-1 role:Secondary
  disk:Consistent
  hyper-test-02 connection:NetworkFailure


So it seems that DRBD didn't detect the I/O failure.

Is there a way to force automatic failover in this case ? I probably missed
a
detection mecanism.

Best regards,
Julien Escario

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Understanding "classic" 3 node set up.

2018-02-09 Thread Igor Cicimov

On 9 Feb 2018 7:30 pm, "Paul O'Rorke"  wrote:

In a functional classic 3 node set up what is the expected state of the
secondary node that is the lower resource with regards the upper resource?

trk-kvm-01 (primary for both resource levels)

root@trk-kvm-01:~/scripts# drbd-overview
[...]
 12:convirt/0Connected Primary/Secondary
UpToDate/UpToDate _convirt   vda virtio
[...]
120:convirt-U/0 ^^12 Connected Primary/Secondary
UpToDate/UpToDate

trk-kvm-02 (secondary lower resource)

root@trk-kvm-02:~/scripts# drbd-overview
[...]
 12:convirt/0ConnectedSecondary/Primary
UpToDate/UpToDate _convirt   vda virtio
[...]
120:convirt-U/0 ^^12 Unconfigured .
.

trk-kvm-03 (secondary on upper resource)

root@trk-kvm-03:/media/scratch# drbd-overview
[...]
120:convirt-U/0  Connected Secondary/Primary UpToDate/UpToDate

I see nowhere to set this node's IP yet the docs say this resource file
"must be distributed across all nodes in the cluster — in this case, three
nodes."  What is the point of the convirt-U res file on trk-kvm-02?

Well I guess if you loose the first node someone needs to take over its
role, right? Never used stacked resources so it is just a guess.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD9 configuration with openstack-cinder-volume on multiple nodes

2017-11-26 Thread Igor Cicimov

Hi Marco,

On 23 Nov 2017 7:05 am, "Marco Marino"  wrote:

Hi, I'm trying to configure drbd9 with openstack-cinder.
Actually my (simplified) infrastructure is composed by:
- 2 drbd9 nodes with 2 NICs on each node, one for the "replication" network
(without using a switch) and one for the "storage" network.
- 1 compute node with a dedicated NIC connected to the storage network
- 1 controller with openstack-cinder-volume installed on it.

Please, review the configuration at https://www.draw.io/#
G1P2uJ9LoXc0bJdNRS9m2fN5cuik73t7_T

My questions are:
1) Using the DRBD Transport, can I connect directly compute nodes to the
drbd storage without passing through the cinder-volume node?
2) if yes, what I have to install on each node? I suppose:
drbd9 kernel module + utils + drbdmanage on DRBD1 and DRBD2
drbd9 kernel module + utils on COMPUTE nodes
drbdmanage on openstack-cinder-volume (???)

3) How should I configure openstack-cinder-volume in drbdmanage???
(External node or whatelse?) Please note that replication network and
storage network are 2 different subnets! IPs I've used in drbdmanage when I
created the drbd cluster belongs to the replication network. Is this
correct?

192.168.10.0/24 = replication network
192.168.20.0/24 = storage network

I'm sorry, but I'm a bit confused about this configuration


Wouldn't you just install Cinder together with drbd on the drbd nodes? Then
you just give cinder the drbd device for its lvm to serve block devices
from via iSCSI as per usual.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Some info

2017-10-11 Thread Igor Cicimov

On 12 Oct 2017 5:10 am, "Gandalf Corvotempesta" <
gandalf.corvotempe...@gmail.com> wrote:

Previously i've asked about DRBDv9+ZFS.
Let's assume a more "standard" setup with DRBDv8 + mdadm.

What I would like to archieve is a simple redundant SAN. (anything
preconfigured for this ?)

Which is best, raid1+drbd+lvm or drbd+raid1+lvm?

Any advantage by creating multiple drbd resources ? I think that a
single DRBD resource is better for administrative point of view.

A simple failover would be enough, I don't need master-master


In that case you might go with raid -> lvm -> drbd -> lvm to benefit from
lvm in both layers
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Question reg. protocol C

2017-09-11 Thread Igor Cicimov

On 11 Sep 2017 4:20 pm, "Ravi Kiran Chilakapati" <
ravikiran.chilakap...@gmail.com> wrote:

Thank you for the response Roland.

I will start going through the source code. In the meantime, it will be
great if these preliminary questions can be answered.

Q: Is Protocol C a variant of any standard atomic commit protocol (like
2PC/3PC etc.)? Or is it a proprietary algorithm?
Q: Let's assume there are 2 disks (D1, D2). Let's assume that D2 is
experiencing a fail-recover situation, but D1 has failed after a D2
failure, but before D2 has recovered. What is the behavior of DRBD in such
a case? Are all future disk writes blocked until both D1 and D2 are
available, and are confirmed to be in sync?

You  need to explain what D1 and D2 exactly are. Are they 2 disks composing
a single volume that is used as a baking device of a single drbd resource?
Or maybe they are disks for a separate volumes of a single drbd resource?
Or maybe each is a backing devices of 2 separate drbd resources? Or maybe
... etc.etc.etc.

Regards,
Ravi.

On Fri, Sep 8, 2017 at 2:07 AM, Roland Kammerer 
wrote:

> On Thu, Sep 07, 2017 at 04:08:10PM -0700, Ravi Kiran Chilakapati wrote:
> > Hi,
> >
> > I was wondering where I could find more information on DRBD's Protocol
> C? I
> > was hoping I could find a resource without resorting to the source code.
> >
> > I tried the following with no luck.
> > 1) asking around on Stackoverflow (
> > https://stackoverflow.com/questions/45998076/an-explanation-
> of-drbd-protocol-c
> > )
> > 2) scanning the user guide (http://docs.linbit.com/docs/users-guide-9.0/
> )
>
> There is no official "user-friendly" documentation besides the UG. So
> ask specific questions and we can answer them (and probably add them to
> the UG in some high level section) or read the code...
>
> Regards, rck
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Dual primary and LVM

2017-07-27 Thread Igor Cicimov

Hey Gionatan,

On Thu, Jul 27, 2017 at 7:04 PM, Gionatan Danti  wrote:

> Il 27-07-2017 10:23 Igor Cicimov ha scritto:
>
>>
>> When in cluster mode LVM will not use local cache that's part of the
>> configuration you need to do during setup.
>>
>>
> Hi Igor, I am not referring to LVM's metadata cache. I speak about the
> kernel I/O buffers (ie: the one you can see from "free -m" under the buffer
> column) which, in some case, work similarly to a "real" pagecache.
>
> Well don't see how is this directly related to dual-primary setup since
even with single primary what ever is not yet committed to disk is not
replicated to the secondary as well. So in case you loose the primary what
ever was in its buffers at the time is gone as well.

But the rule-of-thumb lets say would be to have as less cache layers as
possible without impact on the performance and retain the data consistency
in the same time. With VMs you have additional cache layer in the guest as
well as the one in the host. There are many documents discussing cache
modes like these
https://www.suse.com/documentation/sles11/book_kvm/data/sect1_1_chapter_book_kvm.html,
https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaat/liaatbpkvmguestcache.htm,
https://pve.proxmox.com/wiki/Performance_Tweaks for example.

So which write cache mode you will use really depends on the specific
hardware you use, the system amount of RAM, the OS sysctl settings (ie how
often you flush to dis, params like vm.dirty_ratio,
vm.dirty_background_ratio etc.), the disk types/speed, the HW RAID
controller (for example with battery backed cache or not) ie DRBD has some
tuning parameters like:

disk-flushes no;
md-flushes no;
disk-barrier no;

which makes it possible to use the write-back caching on the *BBU-backed*
RAID controller instead of flushing directly to disk. So many factors are
in play but the main idea is to reduce the number of caches (or their
caching time) between the data and the disk as much as possible without
loosing data or performance.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Dual primary and LVM

2017-07-27 Thread Igor Cicimov

On Thu, Jul 27, 2017 at 9:04 PM, Igor Cicimov <
ig...@encompasscorporation.com> wrote:

> Hey Gionatan,
>
> On Thu, Jul 27, 2017 at 7:04 PM, Gionatan Danti 
> wrote:
>
>> Il 27-07-2017 10:23 Igor Cicimov ha scritto:
>>
>>>
>>> When in cluster mode LVM will not use local cache that's part of the
>>> configuration you need to do during setup.
>>>
>>>
>> Hi Igor, I am not referring to LVM's metadata cache. I speak about the
>> kernel I/O buffers (ie: the one you can see from "free -m" under the buffer
>> column) which, in some case, work similarly to a "real" pagecache.
>>
>> Well don't see how is this directly related to dual-primary setup since
> even with single primary what ever is not yet committed to disk is not
> replicated to the secondary as well. So in case you loose the primary what
> ever was in its buffers at the time is gone as well.
> 
>
> But the rule-of-thumb lets say would be to have as less cache layers as
> possible without impact on the performance and retain the data consistency
> in the same time. With VMs you have additional cache layer in the guest as
> well as the one in the host. There are many documents discussing cache
> modes like these https://www.suse.com/documentation/sles11/book_kvm/
> data/sect1_1_chapter_book_kvm.html, https://www.ibm.com/support/
> knowledgecenter/en/linuxonibm/liaat/liaatbpkvmguestcache.htm,
> https://pve.proxmox.com/wiki/Performance_Tweaks for example.
>
> So which write cache mode you will use really depends on the specific
> hardware you use, the system amount of RAM, the OS sysctl settings (ie how
> often you flush to dis, params like vm.dirty_ratio,
> vm.dirty_background_ratio etc.), the disk types/speed, the HW RAID
> controller (for example with battery backed cache or not) ie DRBD has some
> tuning parameters like:
>
> disk-flushes no;
> md-flushes no;
> disk-barrier no;
>
> which makes it possible to use the write-back caching on the *BBU-backed*
> RAID controller instead of flushing directly to disk. So many factors are
> in play but the main idea is to reduce the number of caches (or their
> caching time) between the data and the disk as much as possible without
> loosing data or performance.
>

And in case of live migration I'm sure the tool you decide to use will
freeze the guest and make sync() call to flush the os cache *before*
stopping and starting the guest on the other node.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Dual primary and LVM

2017-07-27 Thread Igor Cicimov

On 27 Jul 2017 6:11 pm, "Gionatan Danti"  wrote:

Il 27-07-2017 09:38 Gionatan Danti ha scritto:

>
> Thanks for your input. I also read your excellent suggestions on link
> Igor posted.
>
>
To clarify: the main reason I am asking about the feasibility of a
dual-primary DRBD setup with LVs on top of it is about cache coherency. Let
me do a step back: the given explaination for deny even read access on a
secondary node is of broken cache coherency/consistency: if the read/write
node writes something the secondary node had previously read, the latter
will not recognize the changes done by the first node.

When in cluster mode LVM will not use local cache that's part of the
configuration you need to do during setup.

The canonical solution to this problem is to use a dual-primary setup with
a clustered filesystem (eg: GFS2) which not only arbitrates write access,
but maintains read cache consistency also.

Now, let's remove the clustered filesystem layer, leaving "naked" LVs only.
How read cache coherency is mantained in this case? As no filesystem is
layered on top of the raw LVs, there is not real pagecache at work, but the
kernel's buffers remains - and they need to be made coherents. How DRBD
achieves this? Does it update the receiving kernel I/O buffers each time
the other node writes something?

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Dual primary and LVM

2017-07-26 Thread Igor Cicimov

Hi Gionatan,

On Thu, Jul 27, 2017 at 12:14 AM, Gionatan Danti  wrote:

> Hi all,
> I have a possibly naive question about a dual primary setup involving LVM
> devices on top of DRBD.
>
> The main question is: using cLVM or native LVM locking, can I safely use a
> LV block device on the first node, *close it*, and reopen it on the second
> one? No filesystem is involved and no host is expected to concurrently use
> the same LV.
>
> Scenario: two CentOS 7 + DRBD 8.4 nodes with LVs on top of DRBD on top of
> a physical RAID array. Basically, DRBD replicate anything written to the
> specific hardware array.
>
> Goal: having a redundant virtual machine setup, where vms can be live
> migrated between the two hosts.
>
> Current setup: I currently run a single-primary, dual nodes setup, where
> the second host has no access at all to any LV. This setup worked very well
> in the past years, but it forbid using live migration (the secondary host
> has no access to the LV-based vdisk attached to the vms, so it is
> impossible to live migrate the running vms).
>

> I thought to use a dual-primary setup to have the LVs available on *both*
> nodes, using a lock manager to arbitrate access to them.
>
> How do you see such a solution? It is workable? Or would you recommend to
> use a clustered filesystem on top of the dual-primary DRBD device?
>
>
I would recommend going through this lengthy post
http://lists.linbit.com/pipermail/drbd-user/2011-January/015236.html
covering all pros and cons of several possible scenarios.

The easiest scenario for dual-primary DRBD would be a DRBD device per VM,
so something like this RAID -> PV -> VG -> LV -> DRBD -> VM, where you
don't even need LVM locking (since that layer is not even exposed to the
user) and is great for dual-primary KVM clusters. You get live migration
and also keep the resizing functionality too since you can grow the
underlying LV and then the DRBD it self to increase the VM disk size lets
say. The VM needs to be started on one node only of course so you (or your
software) need to make sure this is always the case. One huge drawback of
this approach though is the large number of DRBD device to maintain in case
of hundreds of VM's! Although since you have already committed to different
approach this might not be possible at this point.

Note: In this case though you don't even need dual primary since the DRBD
for each VM can be independently promoted to primary on any node at any
time. In case of single DRBD it is all-or-nothing so no possibility of
migrating individual VMs.

Now adding LV on top of DRBD is bit more complicated. I guess your current
setup is something like this?

  LV1 -> VM1
RAID -> DRBD -> PV -> VG -> LV2 -> VM2
  LV3 -> VM3
   ...

In this case when DRBD is in dual-primary the DLM/cLVM setup is imperative
so the LVMs know who has the write access. But then on VM migration
"something" needs to shutdown the VM on Node1 to release the LVM lock and
start the VM on Node2. Same as above, as long as each VM is running on
*only one* node you should be fine, the moment you start it on both you
will probably corrupt your VM. Software like Proxmox should be able to help
you on both points.

Which brings me to an important question I should have asked at the very
beginning actually: what do you use to manage the cluster?? (if anything)

Regards,
Igor
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] BUG DRBD9/Proxmox4: DRBD won't start automatically

2017-06-20 Thread Igor Cicimov

On Tue, Jun 20, 2017 at 9:25 PM, Dominic Pratt  wrote:

> > There is a systemd drbdmanged.service file
>
>
>
> Indeed there is a service file, but it’s obviously deactivated by default.
>
>
>
> Is this the desired behavior? We think, it would be better, to just enable
> the service after the drbd initalization or immediatly after the
> installation oft he drbd-packages.
>

Don't forget that the DRBD resources might be managed by something else but
drbd/drbdmange, like Pacemaker for example, so not starting them
automatically is the right thing to do IMO.


>
>
> Mit freundlichen Grüßen
>
>
>
> Dominic Pratt
>
> Produkt- und Softwareentwicklung
>
>
>
> ---
>
> Besuchen Sie unsere neue Webseite! www.zmt.info
>
>
>
> [image: Logo schwarz weiss - side-by-side - vorlage logo_ohne2]
>
>
> Zengel Medizintechnik GmbH
>
> Am Hahnenbusch 14b
>
> 55268 Nieder-Olm
>
>
>
> Dominic Pratt
> Tel.: +49-6136-994390 <+49%206136%20994390>
> Fax.: +49-6136-9943999
>
> E-Mail: d...@zmt.info
>
>
>
> Geschäftsführer: Grischa Zengel
>
> Amtsgericht Mainz HRB 46777
>
> USt-IdNr.: DE255806446
>
>
>
> *Von:* drbd-user-boun...@lists.linbit.com [mailto:drbd-user-bounces@
> lists.linbit.com] *Im Auftrag von *Dominic Pratt
> *Gesendet:* Montag, 19. Juni 2017 16:04
> *An:* 'drbd-user@lists.linbit.com' 
> *Betreff:* [DRBD-user] BUG DRBD9/Proxmox4: DRBD won't start automatically
>
>
>
> Hi guys,
>
>
>
> we’re experiencing a strange behaviour when we reboot our no-storage node
> in the three-node-cluster.
>
>
>
> DRBD won’t start automatically on this node and we have to start DRBD
> through „drbdmanage startup“. After that, the output of „drbdmanage
> list-nodes“ is correct and lists all nodes.
>
>
>
> Shouldn’t DRBD start automatically? There’s no startup script.
>
>
>
> Mit freundlichen Grüßen
>
>
>
> Dominic Pratt
>
> Produkt- und Softwareentwicklung
>
>
>
> ---
>
> Besuchen Sie unsere neue Webseite! www.zmt.info
>
>
>
> [image: Logo schwarz weiss - side-by-side - vorlage logo_ohne2]
>
>
> Zengel Medizintechnik GmbH
>
> Am Hahnenbusch 14b
>
> 55268 Nieder-Olm
>
>
>
> Dominic Pratt
> Tel.: +49-6136-994390 <+49%206136%20994390>
> Fax.: +49-6136-9943999 <+49%206136%209943999>
>
> E-Mail: d...@zmt.info
>
>
>
> Geschäftsführer: Grischa Zengel
>
> Amtsgericht Mainz HRB 46777
>
> USt-IdNr.: DE255806446
>
>
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>


-- 
Igor Cicimov | DevOps


p. +61 (0) 433 078 728
e. ig...@encompasscorporation.com <http://encompasscorporation.com/>
w*.* www.encompasscorporation.com
a. Level 4, 65 York Street, Sydney 2000
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Strange status with drbd-overview

2017-06-14 Thread Igor Cicimov

On 14 Jun 2017 5:23 pm, "Roland Kammerer" 
wrote:

On Wed, Jun 14, 2017 at 11:26:25AM +1000, Igor Cicimov wrote:
> Hi Roland,
>
> I noticed issues with 8.4.x not being able to compile on this very
> 4.4.67-1-pve kernel (it is working fine on the 4.4.8 I upgraded from). Not
> sure if it is due to some custom kernel changes done by the Proxmox team
in
> this particular kernel so have to ask if you are aware of any
> incompatibility in the latest 4.4 kernels or is this PVE specific?

The 'x' is a bit unspecific, but anyways, I'm not aware of any problems
and I'm not able to reproduce them:

$ uname -r
4.4.67-1-pve

$ make -j8
...
make -C /lib/modules/4.4.67-1-pve/build SUBDIRS=/root/src/drbd-8.4.10-1/drbd
modules
...
LD [M]  /root/src/drbd-8.4.10-1/drbd/drbd.o
Building modules, stage 2.
MODPOST 1 modules
CC  /root/src/drbd-8.4.10-1/drbd/drbd.mod.o
LD [M]  /root/src/drbd-8.4.10-1/drbd/drbd.ko
mv .drbd_kernelrelease.new .drbd_kernelrelease
Memorizing module configuration ... done.
make[1]: Leaving directory '/root/src/drbd-8.4.10-1/drbd'

Module build was successful.


Hmmm ok what I tried was 8.4.[78]-1 and that didn't work.



> Sorry I'm not trying to hijack the thread, although it looks like that,

Then don't do it, threads are cheap ;-)


Yeah but time is not ;-) and dont want to waste it troubleshooting
something that is already known.


Regards, rck
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Strange status with drbd-overview

2017-06-13 Thread Igor Cicimov

Hi Roland,

I noticed issues with 8.4.x not being able to compile on this very
4.4.67-1-pve kernel (it is working fine on the 4.4.8 I upgraded from). Not
sure if it is due to some custom kernel changes done by the Proxmox team in
this particular kernel so have to ask if you are aware of any
incompatibility in the latest 4.4 kernels or is this PVE specific?

Sorry I'm not trying to hijack the thread, although it looks like that,
just the specific kernel version made me curious.

Thanks,
Igor

On Wed, Jun 14, 2017 at 1:02 AM, Roland Kammerer  wrote:

> On Tue, Jun 13, 2017 at 04:01:22PM +0200, JdT wrote:
> > version: 9.0.7-1 (api:2/proto:86-112)
> > GIT-hash: 91ecdc41ea558c8a2debb75b3441998c92bb0303 build by
> f.gruenbichler@nora, 2017-06-08 17:11:53
> > Transports (api:15): tcp (1.0.0)
>
> Okay, yes, I remember a fix in that regard. So this should improve with
> 9.0.8. If you can, that would be the perfect time to test rc2 and check
> if it is solved.
>
> Regards, rck
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>

-- 
Igor Cicimov | DevOps

p. +61 (0) 433 078 728
e. ig...@encompasscorporation.com <http://encompasscorporation.com/>
w*.* www.encompasscorporation.com
a. Level 4, 65 York Street, Sydney 2000
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD errors

2017-06-07 Thread Igor Cicimov

On 6 Jun 2017 7:23 pm, "Andrea del Monaco" <
andrea.delmon...@clustervision.com> wrote:

Hello everybody,

I am currently facing some issues with the DRBD syncronization.
Here is the config file:
global {
usage-count no;
}

common {
startup {
wfc-timeout 15;
degr-wfc-timeout 15;
outdated-wfc-timeout 15;
}
disk {
resync-rate 80M;
disk-flushes no;
disk-barrier no;
al-extents 3389;
c-fill-target 0;
c-plan-ahead 18;
c-max-rate 200M;
}
net {
protocol C;
max-buffers 8000;
max-epoch-size 8000;
sndbuf-size 1024k;
}
}

resource cmshareddrbdres {
net {
cram-hmac-alg sha1;
shared-secret xxx;
after-sb-0pri discard-younger-primary;
after-sb-1pri discard-secondary;
csums-alg md5;
}
on master1 {
device /dev/drbd1;
disk   /dev/sdb;
address10.149.255.254:7789;
meta-disk  internal;
}
on master2 {
device /dev/drbd1;
disk   /dev/sdb;
address10.149.255.253:7789;
meta-disk  internal;
}
}

The network 10.149.0.0/16 is using IPoIB.

The messages that i see are (first master): https://pastebin.com/0xCLceeD

Suspect messages:
[Sun Jun  4 03:50:17 2017] block drbd1: logical block size of local backend
does not match (drbd:512, backend:4096); was this a late attach?
[Sun Jun  4 03:51:01 2017] drbd cmshareddrbdres: [drbd_w_cmshared/3640]
sock_sendmsg time expired, ko = 6
[Sun Jun  4 03:34:12 2017] block drbd1: We did not send a P_BARRIER for
84203ms > ko-count (7) * timeout (60 * 0.1s); drbd kernel thread blocked?
(I see so many of these)

To me, i would say that there is some issue with the network, but i am not
sure, because in that case i would expect drbd to be able to send the
messages but going in timeout on the other side.

I have tried to stress it and i couldn't reproduce it, so it doesn't seem
to be load-related.

[root@master1 ~]# uname -r
3.10.0-327.el7.x86_64
[root@master1 ~]# rpm -qa | grep drbd
kmod-drbd84-8.4.7-1_1.el7.elrepo.x86_64
drbd84-utils-8.9.5-1.el7.elrepo.x86_64

Any ideas?


Regards,
-- 

[image: clustervision_logo.png]
Andrea Del Monaco
Internal Engineer


Mob: +31 64 166 4003
Skype: delmonaco.andrea
andrea.delmon...@clustervision.com

ClusterVision BV
Gyroscoopweg 56
1042 AC Amsterdam
The Netherlands
Tel: +31 20 407 7550 <+31%2020%20407%207550>
Fax: +31 84 759 8389 <+31%2084%20759%208389>
www.clustervision.com


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

The ko-count thing from the log means the secondary fails to commit the
writes in expected time frame which looks to me like backing device
storage/driver/os issues rather than drbd. I would check if that works
properly first if I was you.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Dual-Primary DRBD node fenced after other node reboots UP

2017-05-23 Thread Igor Cicimov

On Tue, May 23, 2017 at 9:04 PM, Raman Gupta  wrote:

> > *why*
>
> > DRBD would not do that by itself,
> > so likely pacemaker decided to do that,
> > and you have to figure out *why*.
> > Pacemaker will have logged the reasons somewhere.
>
> The crm-fence-peer.sh script could not find the status of peer node (which
> went down) and assumed its status was "unknown" and thus placed a
> constraint on DRBD with -INFINITY score which essentially demotes and stops
> DRBD. The demotion failed because GFS2 was already mounted. This failure
> was construed as error by Pacemaker and it scheduled stonith for itself
> when the down node was back.
>
> > "crm-fence-peer.sh" assumes that the result of "uname -n"
> > is the local nodes "pacemaker node name".
> Yes.
>
> > If "uname -n" and "crm_node -n" do not return the same thing for you,
> > the defaults will not work for you.
>
> For my network the replication network (and its hostname) is different
> from client facing network (and its hostname):
> [root@server7]# uname -n
> server7
> [root@server7]# crm_node -n
> server7ha
>
> However things seems to be working with these settings.
>
>
> >Then in addition to all your other trouble,
> > you have missing dependency constraints.
>
> The proper integration of DRBD+GFS2+DLM+CLVM resources into Pacemaker was
> the issue.
>

Which is the very thing pointed in my answer to your previous thread,
http://marc.info/?l=drbd-user&m=14904721736&w=2, the *proper
integration*.There is no compromise regarding this it *ALL* has to be
managed by pacemaker not just parts of this and that like in your case and
started/stopped/collocated in proper order by pacemaker. Period.


-- 
Igor Cicimov | DevOps


p. +61 (0) 433 078 728
e. ig...@encompasscorporation.com <http://encompasscorporation.com/>
w*.* www.encompasscorporation.com
a. Level 4, 65 York Street, Sydney 2000
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Fencing DRBD on Poweroff of Primary

2017-04-09 Thread Igor Cicimov

On 10 Apr 2017 7:42 am, "Marco Certelli"  wrote:

Hello. Thanks for the answer.
Maybe I was not clear: I do not want the authomatic poweroff of the server.


Why do you have problem with this? The server is already powering off right?

My problem is that if I manually poweroff the primary node (i.e. the server
with DRBD primary mounted on), the secondary does not become primary
(promote) anymore!


>From the docs:
Thus, if the DRBD replication link becomes disconnected, the
crm-fence-peer.sh script contacts the cluster manager, determines the
Pacemaker Master/Slave resource associated with this DRBD resource, and
ensures that the Master/Slave resource no longer gets promoted on any node
other than the currently active one. Conversely, when the connection is
re-established and DRBD completes its synchronization process, then that
constraint is removed and the cluster manager is free to promote the
resource on any node again.

It seems that the primary, just before powering off, fences the other node
and precludes it to become primary.


No, see above it is working as designed.

This not the expected logic in a two nodes cluster...
Is there a way?


Il Domenica 9 Aprile 2017 22:48, Digimer  ha scritto:


On 09/04/17 03:05 PM, Marco Certelli wrote:
> Hello,
>
> very simple question for DRBD experts. I'm configuring Pacemaker (2
> nodes Active/Standby)+DRBD shared disk with the following config:
>
> ...
> disk {
>fencing resource-only;

This should be 'resource-and-stonith;'


>}
>handlers {
>fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>}
> ...
>
> It happen that if I poweroff the Active server (the one with DRBD
> Primary mounted on), the backup cannot promote and mount the DRBD
> anymore. This is not what I would like to happen and this problem does
> not occur if I remove the above fencing configuration (fencing,
> fence-peer and after-resync-target commands).
>
> My only objective is to prevent promoting of a disk that is under
> resynch. Is there a solution? I was thinking to the following
configuration:
>
> ...
> disk {
>fencing resource-only;
>}
>handlers {
>before-resync-target "/usr/lib/drbd/crm-fence-peer.sh";
>after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>}
> ...
>
> Do you think it may work, without other negative effects?
>
> Thanks, Marco.

>
>
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>


-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould




___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Host with multiple ip addresses

2017-04-04 Thread Igor Cicimov

On 4 Apr 2017 11:08 pm, "Robert Altnoeder" 
wrote:

On 04/04/2017 02:48 PM, Frank Rust wrote:
> That’s what I tried, but what is not working, because the drbdmanage
software detects its own name by doing os.uname().
> And that reports the name from /etc/hostname, corresponding to the
external interface.
No, it does not. The nodename is not the hostname.

For drbdmanage to work, the name of each registered node must be the
node name of that node, and the IP address must be an address that
enables reaching the other registered nodes (e.g., all nodes must be on
the same network/subnetwork or routed appropriately).
The hostnames are irrelevant too.

Well the OP reported this is not the case. At least for him. We have to
assume networking issue then?

In other words, the name matters, the IP address does not, as long as
the hosts can use it to communicate. You cannot just use some other
name, you can however use different IP addresses.

Which IP address to use can be specified when initializing the
drbdmanage cluster and when adding nodes
(drbdmanage init , drbdmanage add-node  )

br,
--
Robert Altnoeder
+43 1 817 82 92 0
robert.altnoe...@linbit.com

LINBIT | Keeping The Digital World Running
DRBD - Corosync - Pacemaker
f /  t /  in /  g+

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Host with multiple ip addresses

2017-04-04 Thread Igor Cicimov

On Tue, Apr 4, 2017 at 9:42 PM, Roberto Resoli 
wrote:

> Il 04/04/2017 13:08, Frank Rust ha scritto:
> > Hi folks,
> >
> > I am wondering if it would be possible to create a drbdmanage cluster
> where the hostname don’t match the ip address of the network interface to
> use.
> >
> > In detail:
> > I have a three node configuration with the ip visible to the outside:
> > node1 IP: 192.168.1.1  hostname fs1
> > node2 IP: 192.168.1.2  hostname fs2
> > node3 IP: 192.168.1.3  hostname fs3
> >
> > Each of the nodes has a second network adapter which shall be used for
> the storage distribution (and is not visible to the outside).
> >   node1 10.10.10.1
> >   node2 10.10.10.2
> >   node3 10.10.10.3
> >
> > if I do
> >   drbdmanage init 10.10.10.1
> >   drbdmanage add-node fs2 10.10.10.2
> >   drbdmanage add-node fs3 10.10.10.3
> >
> > It will not work because the IP-address of "fs1" is 192.168.1.1 and not
> 10.10.10.1 and so on.
> > I hope it’s possible to understand my description...
> >
> > I can not change the hostnames according to the storage network because
> it would break other services for the outside.
> >
> > So my question: how would I configure it to get the second network
> adapter used.
>
> My cluster is very similar to your; here my /etc/hosts, actualized with
> your IPs (and hostname, replace "yourdomain.com" with your own); maybe
> pvelocalhost is redundant in your case:
>
> 127.0.0.1 localhost.localdomain localhost
> 192.168.1.1 fs1.yourdomain.com fs1 pvelocalhost
> 10.10.10.1 fs1.yourdomain.com  fs1
> 192.168.1.2 fs2.yourdomain.com fs2
> 10.10.10.2 fs2.yourdomain.con fs2
> 192.168.1.3 fs3.yourdomain.com fs3
> 10.10.10.3 fs3.yourdomain.com fs3
>
> I have initialized DRBD9 without problems ...
>
> rob
>
> Or simply use different host names for the other network like:

192.168.1.1  fs1
192.168.1.2  fs2
192.168.1.3  fs3
10.10.10.1   sfs1
10.10.10.2   sfs2
10.10.10.3   sfs3

and set the cluster using those names:

drbdmanage init 10.10.10.1
drbdmanage add-node sfs2 10.10.10.2
drbdmanage add-node sfs3 10.10.10.3

That's what I usually do it also helps to differentiate which network are
we talking about from the host name perspective. Except if keeping the
original names is a must for some reason ...
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] disaster management on primary-primary setup

2017-04-01 Thread Igor Cicimov

On 2 Apr 2017 3:45 am, "Gregor Burck"  wrote:

Hi,

I'm testing drbd on a debian sytem. (drbd 8.9.2)
My setup is two nodes with a primary-primary setup with gfs2

I mount the cluster resource in the local filesystem. (/dev/drbd0 on
/clusterdata type gfs2)

When I kill one node (take the electircal wire) the still existing node
can't access the files:

ls -l /clusterdata hang up, I can't kill the command, even from a other
root account with kill -9

Is this a problem with drbd or maybe with gfs2?

How to bring the situation under control?


How on earth do you expect people to know without showing any
configuration, logs and drbd and cluster status at the time this happens???


Bye

Gregor

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Problem with drbd and data alignment

2017-03-28 Thread Igor Cicimov

On 28 Mar 2017 6:42 pm, "Marco Marino"  wrote:

Hi Igor, you're right, I created the the partition on the client but I'm a
bit confused because I'm using /dev/drbd3 as a backstore for targetcli so
I'm expecting something like this:
sde
|--drbd3
 |--drbd3p1
   |--myvolgroup_lv1

instead of

sde
|-sde1
|-drbd3


Furthermore, I'd like to understand if I need to change my configuration or
it is ok for a production environment. The idea is that I need a drbd
device used as a backstores in targetcli and nothing else. I don't need to
resize it. On the initiator server I need to create a PV and then a VG and
many LVs. Actually I'm using a raw device as a backing device for drbd and
/dev/drbdX as a backstores. Let me know what do you think about this.


That is perfectly fine, single drbd device backed by a big block storage
exported as single lun over a target. Then a vg on the pv created from the
iscsi block device from which the lvs are created for the vm's. This is
fine for single vm host.

Thank you,
Marco





2017-03-27 23:40 GMT+02:00 Igor Cicimov :

>
>
> On 25 Mar 2017 8:16 pm, "Marco Marino"  wrote:
>
> Hi, I'm trying to understand how to configure raw devices when used with
> drbd. I think I have a problem with data alignment. Let me describe my case:
>
> I have a raw device /dev/sde on both nodes and on top of it there is the
> drbd device. So, in the .res configuration file I have
> 
> disk/dev/sde;
> device  /dev/drbd2;
> 
>
> At this point I used /dev/drbd2, *without create any partition,* as a
> backstore in the targetcli.
> On the iscsi initiator appeared one new device (/dev/sdf), and on top of
> it I create a new partition with type = LVM and finally a new PV, VG and
> many LVs.
> From the iscsi initiator it seems to be ok:
>
> [root@cv1 ~]# gdisk /dev/sdf
> GPT fdisk (gdisk) version 0.8.6
>
> Partition table scan:
>   MBR: protective
>   BSD: not present
>   APM: not present
>   GPT: present
>
> Found valid GPT with protective MBR; using GPT.
>
> Command (? for help): p
> Disk /dev/sdf: 7808351448 <(780)%20835-1448> sectors, 3.6 TiB
> Logical sector size: 512 bytes
> Disk identifier (GUID): D5C7161E-5C4C-42B3-BEDE-520DC049D106
> Partition table holds up to 128 entries
> First usable sector is 34, last usable sector is 7808351414
> <(780)%20835-1414>
> Partitions will be aligned on 2048-sector boundaries
> Total free space is 3221 sectors (1.6 MiB)
>
> Number  Start (sector)End (sector)  Size   Code  Name
>12048  7808350207 <(780)%20835-0207>   3.6 TiB
> 8E00  primary
>
> Command (? for help): q
> [root@cv1 ~]#
>
>
>
> The problem is on the drbd node where I have:
>
> [root@iscsi2 ~]# lsblk
> NAME   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
> 
> sde  8:64   0   3,7T  0 disk
> ├─sde1   8:65   0   3,7T  0 part
> └─drbd2147:20   3,7T  0 disk
> [root@iscsi2 ~]#
>
> Futrhermore
>
> [root@iscsi2 ~]# gdisk /dev/sde
> GPT fdisk (gdisk) version 0.8.6
>
> Partition table scan:
>   MBR: protective
>   BSD: not present
>   APM: not present
>   GPT: present
>
> Found valid GPT with protective MBR; using GPT.
>
> Command (? for help): p
> Disk /dev/sde: 7808589824 <(780)%20858-9824> sectors, 3.6 TiB
> Logical sector size: 512 bytes
> Disk identifier (GUID): D5C7161E-5C4C-42B3-BEDE-520DC049D106
> Partition table holds up to 128 entries
> First usable sector is 34, last usable sector is 7808351414
> <(780)%20835-1414>
> Partitions will be aligned on 2048-sector boundaries
> Total free space is 3221 sectors (1.6 MiB)
>
> Number  Start (sector)End (sector)  Size   Code  Name
>12048  7808350207 <(780)%20835-0207>   3.6 TiB
> 8E00  primary
>
> Command (? for help): q
> [root@iscsi2 ~]#
>
>
>
>
> It seems that there are 2 "partitions" even though drbd2 is not a
> partition. Is this a problem related to my configuration?
>
>
> What do you mean 2 partitions? It is a partition on sde you created on the
> client and the drbd device itself. The lsblk shows you exactly that.
>
> My question is: how can I solve? Should I create a new partition on
> /dev/sde and use /dev/sde1 as a backing device for the drbd resource? Or
> perhaps should I create a partition on top of /dev/drbd2 before I can use
> the device as a backstore in targetcli??
>
> Thank you,
> Marco
>
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Problem with drbd and data alignment

2017-03-27 Thread Igor Cicimov

On 25 Mar 2017 8:16 pm, "Marco Marino"  wrote:

Hi, I'm trying to understand how to configure raw devices when used with
drbd. I think I have a problem with data alignment. Let me describe my case:

I have a raw device /dev/sde on both nodes and on top of it there is the
drbd device. So, in the .res configuration file I have

disk/dev/sde;
device  /dev/drbd2;


At this point I used /dev/drbd2, *without create any partition,* as a
backstore in the targetcli.
On the iscsi initiator appeared one new device (/dev/sdf), and on top of it
I create a new partition with type = LVM and finally a new PV, VG and many
LVs.
>From the iscsi initiator it seems to be ok:

[root@cv1 ~]# gdisk /dev/sdf
GPT fdisk (gdisk) version 0.8.6

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.

Command (? for help): p
Disk /dev/sdf: 7808351448 <(780)%20835-1448> sectors, 3.6 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): D5C7161E-5C4C-42B3-BEDE-520DC049D106
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 7808351414
<(780)%20835-1414>
Partitions will be aligned on 2048-sector boundaries
Total free space is 3221 sectors (1.6 MiB)

Number  Start (sector)End (sector)  Size   Code  Name
   12048  7808350207 <(780)%20835-0207>   3.6 TiB 8E00
primary

Command (? for help): q
[root@cv1 ~]#



The problem is on the drbd node where I have:

[root@iscsi2 ~]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT

sde  8:64   0   3,7T  0 disk
├─sde1   8:65   0   3,7T  0 part
└─drbd2147:20   3,7T  0 disk
[root@iscsi2 ~]#

Futrhermore

[root@iscsi2 ~]# gdisk /dev/sde
GPT fdisk (gdisk) version 0.8.6

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.

Command (? for help): p
Disk /dev/sde: 7808589824 <(780)%20858-9824> sectors, 3.6 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): D5C7161E-5C4C-42B3-BEDE-520DC049D106
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 7808351414
<(780)%20835-1414>
Partitions will be aligned on 2048-sector boundaries
Total free space is 3221 sectors (1.6 MiB)

Number  Start (sector)End (sector)  Size   Code  Name
   12048  7808350207 <(780)%20835-0207>   3.6 TiB 8E00
primary

Command (? for help): q
[root@iscsi2 ~]#




It seems that there are 2 "partitions" even though drbd2 is not a
partition. Is this a problem related to my configuration?


What do you mean 2 partitions? It is a partition on sde you created on the
client and the drbd device itself. The lsblk shows you exactly that.

My question is: how can I solve? Should I create a new partition on
/dev/sde and use /dev/sde1 as a backing device for the drbd resource? Or
perhaps should I create a partition on top of /dev/drbd2 before I can use
the device as a backstore in targetcli??

Thank you,
Marco


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] GFS2 - DualPrimaryDRBD hangs if a node Crashes

2017-03-24 Thread Igor Cicimov

On 25 Mar 2017 11:00 am, "Igor Cicimov"  wrote:

Raman,

On Sat, Mar 25, 2017 at 12:07 AM, Raman Gupta 
wrote:

> Hi,
>
> Thanks for looking into this issue. Here is my 'pcs status' and attached
> is cib.xml pacemaker file
>
> [root@server4 cib]# pcs status
> Cluster name: vCluster
> Stack: corosync
> Current DC: server7ha (version 1.1.15-11.el7_3.4-e174ec8) - partition with
> quorum
> Last updated: Fri Mar 24 18:33:05 2017  Last change: Wed Mar 22
> 13:22:19 2017 by root via cibadmin on server7ha
>
> 2 nodes and 7 resources configured
>
> Online: [ server4ha server7ha ]
>
> Full list of resources:
>
>  vCluster-VirtualIP-10.168.10.199   (ocf::heartbeat:IPaddr2):
> Started server7ha
>  vCluster-Stonith-server7ha (stonith:fence_ipmilan):Started
> server4ha
>  vCluster-Stonith-server4ha (stonith:fence_ipmilan):Started
> server7ha
>  Clone Set: dlm-clone [dlm]
>  Started: [ server4ha server7ha ]
>  Clone Set: clvmd-clone [clvmd]
>  Started: [ server4ha server7ha ]
>
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
> [root@server4 cib]#
>
>
This shows us the problem: you have not configured any DRBD resource in
Pacemaker hence it has no idea and control over it.

This is from one of my clusters:

Online: [ sl01 sl02 ]

 p_fence_sl01 (stonith:fence_ipmilan): Started sl02
 p_fence_sl02 (stonith:fence_ipmilan): Started sl01
* Master/Slave Set: ms_drbd [p_drbd_r0]*
* Masters: [ sl01 sl02 ]*
 Clone Set: cl_dlm [p_controld]
 Started: [ sl01 sl02 ]
 Clone Set: cl_fs_gfs2 [p_fs_gfs2]
 Started: [ sl01 sl02 ]

You can notice the resources you are missing in bold, more specifically you
have missed to configure DRBD and it's MS resource then colocation and
contsraint resources too. So the "resource-and-stonith" hook in your drbd
config will never work, Pacemaker does not know about any drbd resources.

This is from one of my production clusters, it's on Ubuntu so no PCS just
CRM and I'm not using cLVM just DLM:

primitive p_controld ocf:pacemaker:controld \
op monitor interval="60" timeout="60" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100" \
params daemon="dlm_controld" \
meta target-role="Started"
*primitive p_drbd_r0 ocf:linbit:drbd \*
* params drbd_resource="r0" adjust_master_score="0 10 1000 1" \*
* op monitor interval="10" role="Master" \*
* op monitor interval="20" role="Slave" \*
* op start interval="0" timeout="240" \*
* op stop interval="0" timeout="100"*
*ms ms_drbd p_drbd_r0 \*
* meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1"
notify="true" interleave="true" target-role="Started"*
primitive p_fs_gfs2 ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/data" fstype="gfs2"
options="_netdev,noatime,rw,acl" \
op monitor interval="20" timeout="40" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60" \
meta is-managed="true"
clone cl_dlm p_controld \
meta globally-unique="false" interleave="true" target-role="Started"
clone cl_fs_gfs2 p_fs_gfs2 \
meta globally-unique="false" interleave="true" ordered="true"
target-role="Started"
colocation cl_fs_gfs2_dlm inf: cl_fs_gfs2 cl_dlm
*colocation co_drbd_dlm inf: cl_dlm ms_drbd:Master*
order o_dlm_fs_gfs2 inf: cl_dlm:start cl_fs_gfs2:start
*order o_drbd_dlm_fs_gfs2 inf: ms_drbd:promote cl_dlm:start
cl_fs_gfs2:start*

I have excluded the fencing stuff for brevity and highlighted the resources
you are missing. Check the rest though as well you might find something you
can use or cross-check with your config.

Also thanks to Digimer about the very useful information (as always) he
contributed explaining how the things actually work.


Just noticed your gfs2 is out of pacemaker control you need to sort that
out too.



> On Fri, Mar 24, 2017 at 1:49 PM, Raman Gupta 
> wrote:
>
>> Hi All,
>>
>> I am having a problem where if in GFS2 dual-Primary-DRBD Pacemaker
>> Cluster, a node crashes then the running node hangs! The CLVM commands
>> hang, the libvirt VM on running node hangs.
>>
>> Env:
>> -
>> CentOS 7.3
>> DRBD 8.4
>> gfs2-utils-3.1.9-3.el7.x86_64
>> Pacemaker 1.1.15-11.el7_3.4
>> corosync-2.4.0-4.el7.x86_64
>>
>>
>> Infrastructure:
>> 
>> 1) Runnin

Re: [DRBD-user] GFS2 - DualPrimaryDRBD hangs if a node Crashes

2017-03-24 Thread Igor Cicimov

Raman,

On Sat, Mar 25, 2017 at 12:07 AM, Raman Gupta 
wrote:

> Hi,
>
> Thanks for looking into this issue. Here is my 'pcs status' and attached
> is cib.xml pacemaker file
>
> [root@server4 cib]# pcs status
> Cluster name: vCluster
> Stack: corosync
> Current DC: server7ha (version 1.1.15-11.el7_3.4-e174ec8) - partition with
> quorum
> Last updated: Fri Mar 24 18:33:05 2017  Last change: Wed Mar 22
> 13:22:19 2017 by root via cibadmin on server7ha
>
> 2 nodes and 7 resources configured
>
> Online: [ server4ha server7ha ]
>
> Full list of resources:
>
>  vCluster-VirtualIP-10.168.10.199   (ocf::heartbeat:IPaddr2):
> Started server7ha
>  vCluster-Stonith-server7ha (stonith:fence_ipmilan):Started
> server4ha
>  vCluster-Stonith-server4ha (stonith:fence_ipmilan):Started
> server7ha
>  Clone Set: dlm-clone [dlm]
>  Started: [ server4ha server7ha ]
>  Clone Set: clvmd-clone [clvmd]
>  Started: [ server4ha server7ha ]
>
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
> [root@server4 cib]#
>
>
This shows us the problem: you have not configured any DRBD resource in
Pacemaker hence it has no idea and control over it.

This is from one of my clusters:

Online: [ sl01 sl02 ]

 p_fence_sl01 (stonith:fence_ipmilan): Started sl02
 p_fence_sl02 (stonith:fence_ipmilan): Started sl01
* Master/Slave Set: ms_drbd [p_drbd_r0]*
* Masters: [ sl01 sl02 ]*
 Clone Set: cl_dlm [p_controld]
 Started: [ sl01 sl02 ]
 Clone Set: cl_fs_gfs2 [p_fs_gfs2]
 Started: [ sl01 sl02 ]

You can notice the resources you are missing in bold, more specifically you
have missed to configure DRBD and it's MS resource then colocation and
contsraint resources too. So the "resource-and-stonith" hook in your drbd
config will never work, Pacemaker does not know about any drbd resources.

This is from one of my production clusters, it's on Ubuntu so no PCS just
CRM and I'm not using cLVM just DLM:

primitive p_controld ocf:pacemaker:controld \
op monitor interval="60" timeout="60" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100" \
params daemon="dlm_controld" \
meta target-role="Started"
*primitive p_drbd_r0 ocf:linbit:drbd \*
* params drbd_resource="r0" adjust_master_score="0 10 1000 1" \*
* op monitor interval="10" role="Master" \*
* op monitor interval="20" role="Slave" \*
* op start interval="0" timeout="240" \*
* op stop interval="0" timeout="100"*
*ms ms_drbd p_drbd_r0 \*
* meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1"
notify="true" interleave="true" target-role="Started"*
primitive p_fs_gfs2 ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/data" fstype="gfs2"
options="_netdev,noatime,rw,acl" \
op monitor interval="20" timeout="40" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60" \
meta is-managed="true"
clone cl_dlm p_controld \
meta globally-unique="false" interleave="true" target-role="Started"
clone cl_fs_gfs2 p_fs_gfs2 \
meta globally-unique="false" interleave="true" ordered="true"
target-role="Started"
colocation cl_fs_gfs2_dlm inf: cl_fs_gfs2 cl_dlm
*colocation co_drbd_dlm inf: cl_dlm ms_drbd:Master*
order o_dlm_fs_gfs2 inf: cl_dlm:start cl_fs_gfs2:start
*order o_drbd_dlm_fs_gfs2 inf: ms_drbd:promote cl_dlm:start
cl_fs_gfs2:start*

I have excluded the fencing stuff for brevity and highlighted the resources
you are missing. Check the rest though as well you might find something you
can use or cross-check with your config.

Also thanks to Digimer about the very useful information (as always) he
contributed explaining how the things actually work.


> On Fri, Mar 24, 2017 at 1:49 PM, Raman Gupta 
> wrote:
>
>> Hi All,
>>
>> I am having a problem where if in GFS2 dual-Primary-DRBD Pacemaker
>> Cluster, a node crashes then the running node hangs! The CLVM commands
>> hang, the libvirt VM on running node hangs.
>>
>> Env:
>> -
>> CentOS 7.3
>> DRBD 8.4
>> gfs2-utils-3.1.9-3.el7.x86_64
>> Pacemaker 1.1.15-11.el7_3.4
>> corosync-2.4.0-4.el7.x86_64
>>
>>
>> Infrastructure:
>> 
>> 1) Running A 2 node Pacemaker Cluster with proper fencing between the
>> two. Nodes are server4 and server7.
>>
>> 2) Running DRBD dual-Primary and hosting GFS2 filesystem.
>>
>> 3) Pacemaker has DLM and cLVM resources configured among others.
>>
>> 4) A KVM/QEMU virtual machine is running on server4 which is holding the
>> cluster resources.
>>
>>
>> Normal:
>> 
>> 5) In normal condition when the two nodes are completely UP then things
>> are fine. The DRBD dual-primary works fine. The disk of VM is hosted on
>> DRBD mount directory /backup and VM runs fine with Live Migration happily
>> happening between the 2 nodes.
>>
>>
>> Problem:
>> 
>> 6) Stop server7 [shutdown -h now] ---> LVM commands like pvdisplay hangs,
>> VM runs only for 120s ---> After 120s DRBD/GFS2 panics (/var/log/messages
>> below) in server4 an

Re: [DRBD-user] GFS2 - DualPrimaryDRBD hangs if a node Crashes

2017-03-24 Thread Igor Cicimov

On Fri, Mar 24, 2017 at 7:19 PM, Raman Gupta  wrote:

> Hi All,
>
> I am having a problem where if in GFS2 dual-Primary-DRBD Pacemaker
> Cluster, a node crashes then the running node hangs! The CLVM commands
> hang, the libvirt VM on running node hangs.
>
> Env:
> -
> CentOS 7.3
> DRBD 8.4
> gfs2-utils-3.1.9-3.el7.x86_64
> Pacemaker 1.1.15-11.el7_3.4
> corosync-2.4.0-4.el7.x86_64
>
>
> Infrastructure:
> 
> 1) Running A 2 node Pacemaker Cluster with proper fencing between the two.
> Nodes are server4 and server7.
>
> 2) Running DRBD dual-Primary and hosting GFS2 filesystem.
>
> 3) Pacemaker has DLM and cLVM resources configured among others.
>
> 4) A KVM/QEMU virtual machine is running on server4 which is holding the
> cluster resources.
>
>
> Normal:
> 
> 5) In normal condition when the two nodes are completely UP then things
> are fine. The DRBD dual-primary works fine. The disk of VM is hosted on
> DRBD mount directory /backup and VM runs fine with Live Migration happily
> happening between the 2 nodes.
>
>
> Problem:
> 
> 6) Stop server7 [shutdown -h now] ---> LVM commands like pvdisplay hangs,
> VM runs only for 120s ---> After 120s DRBD/GFS2 panics (/var/log/messages
> below) in server4 and DRBD mount directory (/backup) becomes unavailable
> and VM hangs in server4. The DRBD though is fine on server4 and in
> Primary/Secondary mode in WFConnection state.
>
> Mar 24 11:29:28 server4 crm-fence-peer.sh[54702]: invoked for vDrbd
> Mar 24 11:29:28 server4 crm-fence-peer.sh[54702]: WARNING drbd-fencing
> could not determine the master id of drbd resource vDrbd
> *Mar 24 11:29:28 server4 kernel: drbd vDrbd: helper command: /sbin/drbdadm
> fence-peer vDrbd exit code 1 (0x100)*
> *Mar 24 11:29:28 server4 kernel: drbd vDrbd: fence-peer helper broken,
> returned 1*
>

I guess this is the problem. Since the drbd fencing script fails DLM will
hang to avoid resource corruption since it has no information about the
status of the other node.


> Mar 24 11:32:01 server4 kernel: INFO: task kworker/8:1H:822 blocked for
> more than 120 seconds.
> Mar 24 11:32:01 server4 kernel: "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> Mar 24 11:32:01 server4 kernel: kworker/8:1HD 880473796c18 0
> 822  2 0x0080
> Mar 24 11:32:01 server4 kernel: Workqueue: glock_workqueue glock_work_func
> [gfs2]
> Mar 24 11:32:01 server4 kernel: 88027674bb10 0046
> 8802736e9f60 88027674bfd8
> Mar 24 11:32:01 server4 kernel: 88027674bfd8 88027674bfd8
> 8802736e9f60 8804757ef808
> Mar 24 11:32:01 server4 kernel:  8804757efa28
> 8804757ef800 880473796c18
> Mar 24 11:32:01 server4 kernel: Call Trace:
> Mar 24 11:32:01 server4 kernel: [] schedule+0x29/0x70
> Mar 24 11:32:01 server4 kernel: []
> drbd_make_request+0x2a4/0x380 [drbd]
> Mar 24 11:32:01 server4 kernel: [] ?
> aes_decrypt+0x260/0xe10
> Mar 24 11:32:01 server4 kernel: [] ?
> wake_up_atomic_t+0x30/0x30
> Mar 24 11:32:01 server4 kernel: []
> generic_make_request+0x109/0x1e0
> Mar 24 11:32:01 server4 kernel: [] submit_bio+0x71/0x150
> Mar 24 11:32:01 server4 kernel: []
> gfs2_meta_read+0x121/0x2a0 [gfs2]
> Mar 24 11:32:01 server4 kernel: []
> gfs2_meta_indirect_buffer+0x62/0x150 [gfs2]
> Mar 24 11:32:01 server4 kernel: [] ?
> load_balance+0x192/0x990
>
> 7) After server7 is UP, Pacemaker Cluster is started, DRBD started and
> Logical Volume activated and only after that in server4 the DRBD mount
> directory (/backup) becomes available and VM resumes in server4.  So after
> server7 is down and till it is completely UP the VM in server4 hangs.
>
>
> Can anyone help how to avoid running node hang when other node crashes?
>
>
> Attaching DRBD config file.
>
>
Do you actually have fencing configured in pacemaker? Since you have drbd
fencing policy set to "resource-and-stonith" you *must* have fencing setup
in pacemaker too. Have you also set no-quorum-policy="ignore" in pacemaker?
maybe show us your pacemaker config too so we don't have to guess

Not related to the problem but I would also add "after-resync-target"
handler too:

handlers {
  ...
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
}


>
> --Raman
>
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Proxmox VE 4.x DRBD 9 plugin - how is it supposed to work?

2017-02-28 Thread Igor Cicimov

On Wed, Mar 1, 2017 at 2:52 AM, Sean M. Pappalardo <
spappala...@renegadetech.com> wrote:

>
> Only because that's all the distro had available as I was running an old
> one. This is part of the reason for the upgrade. Unfortunately,
> Proxmox's kernel includes the DRBD 9 module only. (Is there any chance
> Linbit's repo could offer a package that supplies the 8.4 module so we
> users can choose until 9.x is production-stable?)
>
>
 You can find a procedure to downgrading to 8.4 here
http://coolsoft.altervista.org/en/blog/2016/08/proxmox-41-kernel-panic-downgrade-drbd-resources-drbd-9-84
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD and ZFS

2017-02-26 Thread Igor Cicimov

On Sat, Feb 25, 2017 at 9:03 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> Anyone using DRBD (8 or 9) with ZFS?
> Any suggestion/howto?
>
> I know that ZFS would like to have direct access to disks, but with
> ZFS this won't be possible.
> Any drawbacks?
>
> Additionally, is DRBDv9 with 3way replication considered stable for
> production use ?
>

Officially no, but from what I can see on this list many people are using
it without any issues.



> DRBD8 with only 2 servers is too subject to splitbrains
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>



-- 
Igor Cicimov | DevOps


p. +61 (0) 433 078 728
e. ig...@encompasscorporation.com <http://encompasscorporation.com/>
w*.* www.encompasscorporation.com
a. Level 4, 65 York Street, Sydney 2000
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD and ZFS

2017-02-26 Thread Igor Cicimov

On 26 Feb 2017 10:58 pm, "Gandalf Corvotempesta" <
gandalf.corvotempe...@gmail.com> wrote:

2017-02-26 10:33 GMT+01:00 Rabin Yasharzadehe :
> what about putting DRBD over ZVOL ?

If possible, I have no issue in doing this.
Anyone using DRBD over ZFS ?
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Drbd over zfs is most common user case. Nothing special here you just
benefit from compression and snapshot of the underlying zfs layer.

With zfs over drbd you need to think about resource failover ie the zfs
resource needs to be exported from one node and imported to the other one ,
something that pacemaker can handle for you for example. You also can't
have dual primary in this case, you can't have zfs pool imported on two
servers in same time.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] ocf:linbit:drbd: DRBD Split-Brain not detected in non standard setup

2017-02-24 Thread Igor Cicimov

On 25 Feb 2017 3:32 am, "Dr. Volker Jaenisch" 
wrote:

Servus !
Am 24.02.2017 um 15:53 schrieb Lars Ellenberg:

On Fri, Feb 24, 2017 at 03:08:04PM +0100, Dr. Volker Jaenisch wrote:

If both 10Gbit links fail then the bond0 aka the worker connection fails
and DRBD goes - as expected - into split brain. But that is not the problem.

DRBD will be *disconnected*, yes.

Sorry, was not precise in my wording. But I assumed that after going into
disconnect state the Cluster manager is informed and reflects this somehow.
I now noticed that a CIB rule is set on the former primary to stay primary
(please have a look at the cluster state at the end of this email.) but I
still wonder why this is not reflected in the crm status. I was misled by
this missing status information and concluded wrongly, that the
ocf:linbit:drbd plugin does not inform the CRM/CIB. Sorry, for blaiming
drbd.

But I am still confused about the behavior of pacemaker in not reflecting
the change of DRBD in the crm status. Maybe this question should go to the
pacemaker list.

But no reason for it to be "split brain"ed yet.
and with proper fencing configured, it won't.

This is our DRBD config. This is all quite basic:

resource r0 {

  disk {
fencing resource-only;
  }


This needs to be:

fencing resource-and-stonith;

if you wish drbd to tell crm to take any action.


  handlers {
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
  }

  on mail1 {
device/dev/drbd1;
disk  /dev/sda1;
address   172.27.250.8:7789;
meta-disk internal;
  }
  on mail2 {
device/dev/drbd1;
disk  /dev/sda1;
address   172.27.250.9:7789;
meta-disk internal;
  }
}

*What did we miss?* We have no Stonith configured, yet. And IMHO a missing
stonith configuration should not interfere with the DRBD-state change. Or
am I wrong with this assumption?


State after bond0 goes down:

root@mail1:/home/volker# crm status
Stack: corosync
Current DC: mail2 (version 1.1.15-e174ec8) - partition with quorum
Last updated: Fri Feb 24 16:56:44 2017  Last change: Fri Feb 24
16:45:19 2017 by root via cibadmin on mail2

2 nodes and 7 resources configured

Online: [ mail1 mail2 ]

Full list of resources:

 Master/Slave Set: ms_drbd_mail [drbd_mail]
 Masters: [ mail2 ]
 Slaves: [ mail1 ]
 Resource Group: FS_IP
 fs_mail(ocf::heartbeat:Filesystem):Started mail2
 vip_193.239.30.23  (ocf::heartbeat:IPaddr2):   Started mail2
 vip_172.27.250.7   (ocf::heartbeat:IPaddr2):   Started mail2
 Resource Group: Services
 postgres_pg2   (ocf::heartbeat:pgsql): Started mail2
 Dovecot(lsb:dovecot):  Started mail2

Failed Actions:
* vip_172.27.250.7_monitor_3 on mail2 'not running' (7): call=55,
status=complete, exitreason='none',
last-rc-change='Fri Feb 24 16:47:07 2017', queued=0ms, exec=0ms

root@mail2:/home/volker# drbd-overview
 1:r0/0  StandAlone Primary/Unknown UpToDate/Outdated /shared/data ext4
916G 12G 858G 2%

root@mail1:/home/volker# drbd-overview



 1:r0/0  WFConnection Secondary/Unknown UpToDate/DUnknown


And after bringing up bond0 again the same state on both machines.
After cleanup of the failed VIP interface still the same state:

root@mail2:/home/volker# crm status
Stack: corosync
Current DC: mail2 (version 1.1.15-e174ec8) - partition with quorum
Last updated: Fri Feb 24 17:01:05 2017  Last change: Fri Feb 24
16:59:32 2017 by hacluster via crmd on mail2

2 nodes and 7 resources configured

Online: [ mail1 mail2 ]

Full list of resources:

 Master/Slave Set: ms_drbd_mail [drbd_mail]
 Masters: [ mail2 ]
 Slaves: [ mail1 ]
 Resource Group: FS_IP
 fs_mail(ocf::heartbeat:Filesystem):Started mail2
 vip_193.239.30.23  (ocf::heartbeat:IPaddr2):   Started mail2
 vip_172.27.250.7   (ocf::heartbeat:IPaddr2):   Started mail2
 Resource Group: Services
 postgres_pg2   (ocf::heartbeat:pgsql): Started mail2
 Dovecot(lsb:dovecot):  Started mail2

root@mail2:/home/volker# drbd-overview
 1:r0/0  StandAlone Primary/Unknown UpToDate/Outdated /shared/data ext4
916G 12G 858G 2%

root@mail1:/home/volker# drbd-overview
 1:r0/0  WFConnection Secondary/Unknown UpToDate/DUnknown

After issuing a

mail2# drbdadm connect all

the nodes resync and everything is in best order (The "sticky" rule is
cleared also).

Cheers,

Volker

General Setup : Stock Debian Jessie without any modifications. DRBD,
Pacemaker etc. all Debian.

Here our crm config:

node 740030984: mail1 \
attributes standby=off
node 740030985: mail2 \
attributes standby=off
primitive Dovecot lsb:dovecot \
op monitor interval=20s timeout=15s \
meta target-role=Started
primitive drbd_mail ocf:linbit:drbd \
params drbd_resource=r0 \
op monitor interval=15s role=Master \
op monitor interval=16s role=Slave \
op start interval=0 timeout=240s \
op stop interval=0

Re: [DRBD-user] drbd8 vs drbd9 performance over 10Gbit link

2017-02-09 Thread Igor Cicimov

On 9 Feb 2017 6:02 pm, "Dr. Volker Jaenisch" 
wrote:

Hi!

I have two pairs of servers over the same 10 Gbit link:

1) Pair A is replicating a 1TB Sata Disk using drbd8 (Debian Jessie 8.6).

2) Pair B is replicating a 0.8TB Volume with an unterlying hardware Raid
10 consisting out of 6 SAS Disks, Stripe 16k utilizing drbd9 (Kernel 4.4
Proxmox)

The server pair B should beat server pair A by at least by a factor of 3.


Can't speak about drbd9 side but have you done a base benchmark of the disk
throughput to actually prove that and eliminate any hardware issues or
misconfiguration on the pair B?


In case of a sync:

* Pair A delivers a sync rate of 20 Mb/sec.

* Pair B delivers a sync rate of 1 MB/sec

Any ideas?

Any help appreciated

Volker

--
=
   inqbus Scientific ComputingDr.  Volker Jaenisch
   Richard-Strauss-Straße 1   +49(08861) 690 474 0
   86956 Schongau-Westhttp://www.inqbus.de
=


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Testing new DRBD9 dedicated repo for PVE

2017-01-04 Thread Igor Cicimov

On 04/01/2017 7:32 pm, "Roland Kammerer"  wrote:

On Tue, Jan 03, 2017 at 10:38:38AM +, Enrica Ruedin wrote:
> I can't understand Linbit to change license in such a way that PVE
> Proxmox has to remove their support completely. This unnecessary
> change leads to confusion.

But you do understand that we also eat and like to have a roof over our
heads? Hey, the Aparthotel Guggach Hotel from which domain you wrote
looks nice, that could solve one problem! ;-)

In the end the license - and sure IANAL - is a "do whatever you like,
but don't interfere with LINBIT's support business". IMO that's it. This
was not a move against Proxmox at all, there are other vendors not
playing nice... Just writing that because on the ML I have the
impression that there is some kind of "Proxmox vs. LINBIT". It isn't.

For the "has to remove [...] completely": Puh, dangerous territory to
comment on, but that was the decision of Proxmox.

No serious company would integrate drbd in their product and then tell
their customers to talk to Linbit in case of issues.

They have all the
right to do that! No blaming, no nothing. There are other
projects/companies that took another road and still ship, integrate, or
use drbdmanage with the current license.

Regards, rck
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD8, dedicated repository for PVE 4

2016-12-26 Thread Igor Cicimov

On 27 Dec 2016 4:15 am, "Jasmin J."  wrote:

Hello Dietmar!


> The functionality you are looking for already exists using DRBD9. It is
> called drbdmanage, and Linbit provides a repository including all
> packages and a storage driver for PVE.
Yes it is ... BUT ...


> AFAIK DRBD9 is stable (or will be soon?)
See here the post from Lars:
  http://lists.linbit.com/pipermail/drbd-user/2016-November/023379.html
-> If you have a two node setup, stay with DRBD 8.4,
   you don't gain anything from the new features of DRBD 9,
   but, as you found out, still may be hit by the regressions.

Because of this post, I removed DRBD9 form my server, switched to DRBD 8.4
and wrote my Proxmox DRBD8 Storage Plugin. I would have loved to have no
work
and to use the already existing tools.


> that is simply not necessary (DRBD9 auto promotion ...)
I like that, but it is not available in DRBD 8.4. So a Proxmox DRBD8 Storage
Plugin would need to use dual primary mode and switch on both sides to
primary.
When Proxmox PVE would execute on one server activate_volume and on the
other
deactivate_volume a few seconds later, this would work. But then, if the
cluster falls apart and for what ever reason it gets activated on both sides
it might be a disaster.

PVE has watchdog fencing to prevent this from happening.

Therefore I simply disabled this in my Storage Plugin.
If someone likes to implement it, he can do this and test. My servers are
now
productive and I have no reason to change a running system!
And ofcourse everybody is free to use DRBD9, drbdmanage and the Proxmox DRBD
Storge Plugin (for DRBD 9.x) provided by Linbit. In the long run THIS is the
preferred strategy.


> ALL DRBD9 related code is now managed by Linbit. AFAIK they plan to
provide
> documentation soon on the DRBD site.
Yes, they did already announce it for Proxmox and the documentation about
9.x
is also very good (the 8.4 was also) and it is a very good thing. But as
written above, currently not 100% stable and recommended at the current
state,
especially for a two node cluster.

BR,
   Jasmin

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD8, dedicated repository for PVE 4

2016-12-22 Thread Igor Cicimov

On Fri, Dec 23, 2016 at 3:04 PM, Jasmin J.  wrote:

> Hi!
>
> > If someone is interested in my Proxmox DRBD8 Storage Plugin, I can try to
> > release it in the next days, even if I have no time due to the coming
> > vacation.
> Did it now. You can find it on GitHub:
>https://github.com/jasmin-j/pve-storage-custom-DRBD8
>
> BR,
>Jasmin
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>

Thanks Jasmin, I'll try to find some time over the holidays and test this.

Cheers,
-- 
Igor Cicimov | DevOps


p. +61 (0) 433 078 728
e. ig...@encompasscorporation.com <http://encompasscorporation.com/>
w*.* www.encompasscorporation.com
a. Level 4, 65 York Street, Sydney 2000
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Howto define disk-barrier/disk-flushes different on both hosts

2016-12-16 Thread Igor Cicimov

On 17 Dec 2016 1:07 pm, "Jasmin J."  wrote:

Hi!

I have a machine (A) with a RAID1 and a BBU. On top of a partition of this
RAID
is LVM and then DRBD 8.4.

The other machine (B), which is the DRBD mirror for the former mentioned
partition, has a normal SATA disk. I try to use Protocol A, so it makes
sense
to configure disk-barrier and disk-flushes different on machine A (quick)
and
B (slower).

Here is my config:
resource vm-100-disk-root {
 net {
 # allow-two-primaries;
 after-sb-0pri discard-zero-changes;
 after-sb-1pri discard-secondary;
 after-sb-2pri disconnect;
 }
 disk {
 no-disk-barrier;

This is 8.3 syntax

 }
 volume 0 {
 device /dev/drbd0;
 meta-disk internal;
 }
 on serverA {
 # LVM on top of RAID1
 disk /dev/vg_vm_disks_A/vm_100_disk_root;
 address 10.1.0.1:7788;
 # we have a BBU on the RAID controller, so no flushing
 # necessary
 no-disk-flushes;

This one too

 }
 on serverB {
 # /dev/sdc1 normal SATA disk
 disk /dev/vg_vm_disks_A/vm_100_disk_root;
 address 10.1.0.2:7788;
 # this is a local disk without battery and cache
 disk-flushes;
 }
 }

But this gives:
  $ drbdadm adjust vm-100-disk-root
  drbd.d/vm_100_disk_root.res:22: Parse error: 'disk | device | address |
 meta-disk | flexible-meta-disk' expected, but got 'no-disk-flushes

Hence drbd complains. I leave it to you to fin the correct syntax in the
8.4 conf manual.

So it seems this is the wrong syntax to describe what I want.

Can someone explain if it is possible to define different "disk-flushes"
options to different hosts/disks with another syntax?
If the answer is yes, how can I do this?
If the answer is no, is there a technical reason or simply "not required
till
now and therefore not implemented"?

BR,
   Jasmin
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Configuration DRBD cache, timeouts and locking

2016-11-27 Thread Igor Cicimov

On 28 Nov 2016 6:18 am,  wrote:
>
> Hello,
>
> i don't speak english very well, but i hope to make my request partly
clear.
>
> My task is to look for an opportunity to make an optimization regarding
> the reliabilty of a DRBD-Connection (Version 9) over a LAN between three
> or four mobile devices. The problems at the task with mobile devices is,
> that the devices may lose occasionally (and temporary) their connection to
> each other, but it has to be so that the DRBD alive in a most efficient
> manner.
> The paramount Topic is a comparison of energy efficiency between using
> DRBD and using NFS at the network edge.
> In NFS i found some configuration-tools to try. Especially configurations
> on the subject caching, locking and configuring timeouts may be useful.

DRBD works on block device level so no file system there. This is what you
need to understand and not compare it with nfs.

> Now, i'm looking for similiar configuration tools for DRBD, but i don't
> find very much in the documentation. I find something about disk flushes,
> but i don't see how to configurate here in a finely granular manner. To
> the subject locking, and configuring timeouts i don't find anything
> helpful. Can you give me please a hint, where in the documentation or in
> other stuff i may find further informations?
>
>
> many thanks
>
> best reguard
>
> Simmr
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD9 daemon not started at startup

2016-11-20 Thread Igor Cicimov

On 21 Nov 2016 1:48 am, "Jasmin J."  wrote:
>
> Hi!
>
> I am playing with Proxmox 4.3 and DRDB9.
> I followed this guide (https://pve.proxmox.com/wiki/DRBD#Disk_for_DRBD),
> because it explains how to setup DRDB (8.x) on top of a physical disk (I
don't
> want to use it on top of LVM). This guide is no longer 100% correct, but
it was
> good enough to configure a working DRBD9 shared storage for Proxmox.
>
> I was able to configure DRBD9 correctly and it is running and
synchronized.
> But after rebooting the server, the daemon wasn't started automatically.
> I did it with "/etc/init.d/drbd start" ("drbdadm up |all" does
it start also).
>
> In /etc/init.d/drbd there is no runlevel defined in "Default-Start:", so
it is never started by the init process and even "update-rc.d drbd
defaults" didn't change something.
>
> What is the intended procedure to start this service?
>
It is a kernel module not a process.

> BR,
>Jasmin
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD8.4 + Corosync + MySQL

2016-11-16 Thread Igor Cicimov

Hi Brandon,

On Wed, Nov 16, 2016 at 5:17 PM, Brandon Chapman 
wrote:

> Hello.
>
> I am trying to integrate MySQL into my cluster currently setup with
> drbd8.4.5 on ubuntu16.04.1LTS.
> I have mostly followed this guide here:
>
> http://www.tokiwinter.com/clustering-with-drbd-corosync-and-pacemaker/
>
> Adapting for my environment, mainly that I am not running centos.
> I have successfully configured the resources and user agents listed in
> this guide, but need to add MySQL HA as well.
>
> This is a current result of crm configure show:
>
> node 1: alpha \
> attributes standby=off
> node 2: beta \
> attributes standby=on
> primitive drbd_res ocf:linbit:drbd \
> params drbd_resource=r0 \
> op monitor interval=29s role=Master \
> op monitor interval=31s role=Slave
> primitive failover_ip IPaddr2 \
> params ip=76.213.77.43 cidr_netmask=24 \
> op monitor interval=30s
> primitive fs_res Filesystem \
> params device="/dev/drbd0" directory="/data" fstype=ext4
> primitive nginx_res nginx \
> params configfile="/etc/nginx/nginx.conf" httpd="/usr/sbin/nginx"
> \
> op monitor interval=0
> ms drbd_master_slave drbd_res \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
> order fs_after_drbd Mandatory: drbd_master_slave:promote fs_res:start
> colocation fs_drbd_colo inf: fs_res drbd_master_slave:Master
> order nginx_after_fs inf: fs_res nginx_res
> order nginx_after_ip Mandatory: failover_ip nginx_res
> colocation nginx_fs_colo inf: nginx_res fs_res
> colocation nginx_ip_colo inf: nginx_res failover_ip
> property cib-bootstrap-options: \
> have-watchdog=false \
> dc-version=1.1.14-70404b0 \
> cluster-infrastructure=corosync \
> cluster-name=debian \
> stonith-enabled=false \
> no-quorum-policy=ignore
>
> I have taken these steps so far:
>
>
> $$ apt install mysql-server -y
> $$ service mysql stop
> $$ systemctl disable mysql
> $$ sudo rsync -av /var/lib/mysql /data
> $$ nano /etc/mysql/mysql.conf.d/mysqld.cnf
>
> socket  = /data/mysql/mysqld.sock
> datadir = /data/mysql
>
> $$  echo "alias /var/lib/mysql/ -> /data/mysql/," >>
> /etc/apparmor.d/tunables/alias
> $$ service apparmor reload
> $$ crm configure primitive mysql_res ocf:heartbeat:mysql params
> binary="/usr/sbin/mysqld" datadir="/data/mysql" 
> socket="/data/mysql/mysqld.sock"
> op start timeout=60s op stop timeout=60s op monitor interval=15s
> $$ crm configure colocation mysql_ip_colo INFINITY: mysql_res failover_ip
> $$ crm configure order mysql_after_ip mandatory: failover_ip mysql_res
>
> This results in an initial start on the online node, but it attempts to
> start on standby as well. If I 'crm node standby' to the onine node, it
> will not start on the other node.
>
> Shows errors after bringing both nodes to standby, then bringing back
> online:
>
> https://i.imgur.com/kOmXGMn.png
>
> How can I go about solving this?
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
> Any errors in the syslog at all?

Did you test if you can start MySQL manually as configured atm? Rule of
thumb, ALWAYS confirm the service can be started manually first before you
go with the cluster.

The "not installed" error suggests using a wrong RA parameter or error in
the OCF agent script itself. So, first check if you have all parameters
correctly stated in the crm configure commands:

# crm ra meta ocf:heartbeat:mysql

then run the agent manually and check for errors in the ouptu of the
ocf-tester:

# ocf-tester -v -n mysql_res -o OCF_ROOT=/usr/lib/ocf -o
binary="/usr/sbin/mysqld" -o datadir="/data/mysql" -o
socket="/data/mysql/mysqld.sock" /usr/lib/ocf/resource.d/heartbeat/mysql

Cheers,
Igor
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] ZFS

2016-10-17 Thread Igor Cicimov

On 17 Oct 2016 6:11 pm, "Gandalf Corvotempesta" <
gandalf.corvotempe...@gmail.com> wrote:
>
> Il 17 ott 2016 09:01, "Jan Schermer"  ha scritto:
> >
> > 3 storages, many more hypervisors, data triplicated... that's the usual
scenario
>
> Are you using drbd9 with a 3 nodes replication?
> could you please share the drbd config?
>
> > We use ZFS on the storages, ZVOLs on top of that, each ZVOL makes up
part of the DRBD resource that gets exported to the hypervisor.
>
> No raid volumes?
> are you creating a single drbd resource for each storage disk?  (In
example, in a server with 12 disks are you creating 12 drbd resources?)
>
> i would like to create a raidz2 on each node but the resulting volume is
not seen as block device thus can't be used as drbd resource
>
And thats why you create zvol then and use it as backing device for the
drbd device. How many zvols/drbds you create depends on the raidz2 size and
your needs. Each zvol backed drbd is then given to the hypervisor to be
used as vm volume.

> I can create a raid6 with mdadm, use that volume for drbd and put zfs on
top of it but there are too many moving parts stacked together,  that would
be a perfect reciepe for a huge mess
>
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

drbd-user@lists.linbit.com

2016-09-27 Thread Igor Cicimov

On 27 Sep 2016 11:51 pm, "刘丹"  wrote:
>
> Failed to send to one or more email server, so send again.
>
>
>
> At 2016-09-27 15:47:37, "Nick Wang"  > wrote: >>>> On 2016-9-26 at 19:17, in message >, Igor >Cicimov <
ig...@encompasscorporation.com> wrote: >> On 26 Sep 2016 7:26 pm,
"mzlld1988"  wrote: >> > >> > I apply the attached patch
file to scripts／drbd.ocf，then pacemaker can >> start drbd successfully，but
only two nodes， the third node's drbd is >> down，is it right? >> Well you
didnt say you have 3 nodes. Usually you use pacemaker with 2 nodes >> and
drbd. >The patch suppose to help on 3(more) nodes scenario, as long as only
one Primary. >Is the 3 nodes DRBD cluster working without pacemaker? And
how did you >configure in pacemaker?
> Accoring to http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf, I
executed the following commands to configure drbd in pacemaker.
>  [root@pcmk-1 ~]# pcs cluster cib drbd_cfg
>  [root@pcmk-1 ~]# pcs -f drbd_cfg resource create WebData ocf:linbit:drbd
\
>  drbd_resource=wwwdata op monitor interval=60s
>  [root@pcmk-1 ~]# pcs -f drbd_cfg resource master WebDataClone WebData \
>  master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
>  notify=true
You need clone-max=3 for 3 nodes

>  [root@pcmk-1
>  ~]# pcs cluster cib-push drbd_cfg >> > And another question is ，can
pacemaker successfully stop the slave node >> ? My result is pacemaker
can't sop the slave node. >> > >Yes, need to check the log on which
resource prevent pacemaker to stop.
>
> Pacemaker can't stop slave node's drbd, I think the reason may be the
same as my previous email(see attached file) ,but no one reply that email.
> [root@drbd ~]# pcs status
>  Cluster name: mycluster
>  Stack: corosync
>  Current DC: drbd.node103 (version 1.1.15-e174ec8) - partition with quorum
>  Last updated: Mon Sep 26 04:36:50 2016  Last change: Mon Sep 26
04:36:49 2016 by root via cibadmin on drbd.node101
>
> 3 nodes and 2 resources configured
>
> Online: [ drbd.node101 drbd.node102 drbd.node103 ]
>
> Full list of resources:
>
>  Master/Slave Set: WebDataClone [WebData]
>Masters: [ drbd.node102 ]
>Slaves: [ drbd.node101 ]
>
> Daemon Status:
>corosync: active/corosync.service is not a native service, redirecting
to /sbin/chkconfig.
>  Executing /sbin/chkconfig corosync --level=5
>  enabled
>pacemaker: active/pacemaker.service is not a native service,
redirecting to /sbin/chkconfig.
>  Executing /sbin/chkconfig pacemaker --level=5
>  enabled
>pcsd: active/enabled
>
> -
> Faile to execute ‘pcs cluster stop drbd.node101’
>
> =Error message on drbd.node101(secondary node)
>   Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [
Command 'drbdsetup down r0' terminated with exit code 11 ]
>   Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [
r0: State change failed: (-10) State change was refused by peer node ]
>   Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [
additional info from kernel: ]
>   Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [
failed to disconnect ]
>   Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [
Command 'drbdsetup down r0' terminated with exit code 11 ]
>   Sep 26 04:39:26 drbd crmd[3524]:   error: Result of stop operation for
WebData on drbd.node101: Timed Out | call=12 key=WebData_stop_0
timeout=10ms
>   Sep 26 04:39:26 drbd crmd[3524]:  notice:
drbd.node101-WebData_stop_0:12 [ r0: State change failed: (-10) State
change was refused by peer node\nadditional info from kernel:\nfailed to
disconnect\nCommand 'drbdsetup down r0' terminated with exit code 11\nr0:
State change failed: (-10) State change was refused by peer
node\nadditional info from kernel:\nfailed to disconnect\nCommand
'drbdsetup down r0' terminated with exit code 11\nr0: State change failed:
(-10) State change was refused by peer node\nadditional info from
kernel:\nfailed t
>  =Error message on drbd.node102(primary node)
>   Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Preparing remote
state change 3578772780 (primary_nodes=4, weak_nodes=FFFB)
>   Sep 26 04:39:25 drbd kernel: drbd r0: State change failed: Refusing to
be Primary while peer is not outdated
>   Sep 26 04:39:25 drbd kernel: drbd r0: Failed: susp-io( no -> fencing)
>   Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Failed: conn(
Connected -> TearDown ) peer( Secondary -> Unknown )
>   Sep 26 04:39:25 drbd kernel: drbd r0/0 drbd1 drbd.node101: Failed:
pdsk( UpToDate ->

drbd-user@lists.linbit.com

2016-09-26 Thread Igor Cicimov

On 26 Sep 2016 7:26 pm, "mzlld1988"  wrote:
>
> I apply the attached patch file to scripts／drbd.ocf，then pacemaker can
start drbd successfully，but only two nodes， the third node's drbd is
down，is it right?
Well you didnt say you have 3 nodes. Usually you use pacemaker with 2 nodes
and drbd.

> And another question is ，can pacemaker  successfully stop the slave node
? My result is pacemaker can't sop the slave node.
>
> I'm looking forward to your answers.Thanks.
>
Yes it works with 2  nodes drbd9 configured in standard way not via
drbdmanager. Haven't tried any other layout.

>
>
> Sent from my Mi phone
> On Igor Cicimov , Sep 25, 2016 9:15 AM
wrote:
>>
>>
>>
>> On Fri, Sep 23, 2016 at 7:16 PM, mzlld1988  wrote:
>>>
>>> Hello, everyone
>>>
>>> I have a question about using DRBD9 with pacemaker 1.1.15 .
>>>
>>> Does DRBD9 can be used in pacemaker？
>>
>>
>> No, not yet but Linbit is working on it as they say. For now you need to
apply the attached patch to the drbd ocf agent.
>>
>>>
>>> Why do my cluster can't start drbd?
>>>
>>> Error message:
>>> "r0 is a normal resource , and not available in stacked mode."
>>>
>>>
>>> Thanks.
>>>
>>> Best Regards.
>>>
>>> ___
>>> drbd-user mailing list
>>> drbd-user@lists.linbit.com
>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>>
>>
>>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

drbd-user@lists.linbit.com

2016-09-24 Thread Igor Cicimov

On Sun, Sep 25, 2016 at 11:15 AM, Igor Cicimov <
ig...@encompasscorporation.com> wrote:

>
>
> On Fri, Sep 23, 2016 at 7:16 PM, mzlld1988  wrote:
>
>> Hello, everyone
>>
>> I have a question about using DRBD9 with pacemaker 1.1.15 .
>>
>> Does DRBD9 can be used in pacemaker？
>>
>
> No, not yet but Linbit is working on it as they say. For now you need to
> apply the attached patch to the drbd ocf agent.
>

Forgot to include the reference:
https://www.mail-archive.com/drbd-user@lists.linbit.com/msg10171.html


>
>> Why do my cluster can't start drbd?
>>
>> Error message:
>> "r0 is a normal resource , and not available in stacked mode."
>>
>>
>> Thanks.
>>
>> Best Regards.
>>
>> ___
>> drbd-user mailing list
>> drbd-user@lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>>
>
> Cheers,
Igor
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

drbd-user@lists.linbit.com

2016-09-24 Thread Igor Cicimov

On Fri, Sep 23, 2016 at 7:16 PM, mzlld1988  wrote:

> Hello, everyone
>
> I have a question about using DRBD9 with pacemaker 1.1.15 .
>
> Does DRBD9 can be used in pacemaker？
>

No, not yet but Linbit is working on it as they say. For now you need to
apply the attached patch to the drbd ocf agent.


> Why do my cluster can't start drbd?
>
> Error message:
> "r0 is a normal resource , and not available in stacked mode."
>
>
> Thanks.
>
> Best Regards.
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
diff --git a/scripts/drbd.ocf b/scripts/drbd.ocf
index 632e16e..91990fc 100755
--- a/scripts/drbd.ocf
+++ b/scripts/drbd.ocf
@@ -328,6 +328,23 @@ remove_master_score() {
 	do_cmd ${HA_SBIN_DIR}/crm_master -l reboot -D
 }
 
+_peer_node_process() {
+	# _since drbd9 support multiple connections
+	: ${_peer_node_id:=0}
+	DRBD_PER_NAME[$_peer_node_id]=$_conn_name
+	DRBD_PER_ID[$_peer_node_id]=$_peer_node_id
+	DRBD_PER_CSTATE[$_peer_node_id]=$_cstate
+	DRBD_PER_ROLE_REMOTE[$_peer_node_id]=${_peer:-Unknown}
+	DRBD_PER_DSTATE_REMOTE[$_peer_node_id]=${_pdsk:-DUnknown}
+
+	: == DEBUG == _peer_node_id == ${_peer_node_id} ==
+	: == DEBUG == DRBD_PER_NAME[_peer_node_id]  == ${DRBD_PER_NAME[${_peer_node_id}]} ==
+	: == DEBUG == DRBD_PER_ID[_peer_node_id]== ${DRBD_PER_ID[${_peer_node_id}]} ==
+	: == DEBUG == DRBD_PER_CSTATE[_peer_node_id]== ${DRBD_PER_CSTATE[${_peer_node_id}]} ==
+	: == DEBUG == DRBD_PER_ROLE_REMOTE[_peer_node_id]   == ${DRBD_PER_ROLE_REMOTE[${_peer_node_id}]} ==
+	: == DEBUG == DRBD_PER_DSTATE_REMOTE[_peer_node_id] == ${DRBD_PER_DSTATE_REMOTE[${_peer_node_id}]} ==
+}
+
 _sh_status_process() {
 	# _volume not present should not happen,
 	# but may help make this agent work even if it talks to drbd 8.3.
@@ -335,11 +352,36 @@ _sh_status_process() {
 	# not-yet-created volumes are reported as -1
 	(( _volume >= 0 )) || _volume=$[1 << 16]
 	DRBD_ROLE_LOCAL[$_volume]=${_role:-Unconfigured}
-	DRBD_ROLE_REMOTE[$_volume]=${_peer:-Unknown}
-	DRBD_CSTATE[$_volume]=$_cstate
 	DRBD_DSTATE_LOCAL[$_volume]=${_disk:-Unconfigured}
-	DRBD_DSTATE_REMOTE[$_volume]=${_pdsk:-DUnknown}
+
+	if $DRBD_VERSION_9 ; then
+		#Get from _peer_node_process
+		DRBD_NAME[$_volume]=${DRBD_PER_NAME[@]}
+		DRBD_ID[$_volume]=${DRBD_PER_ID[@]}
+		DRBD_VOLUME[$_volume]=${_volume}
+		DRBD_CSTATE[$_volume]=${DRBD_PER_CSTATE[@]}
+		DRBD_ROLE_REMOTE[$_volume]=${DRBD_PER_ROLE_REMOTE[@]}
+		DRBD_DSTATE_REMOTE[$_volume]=${DRBD_PER_DSTATE_REMOTE[@]}
+
+		DRBD_PER_NAME=()
+		DRBD_PER_ID=()
+		DRBD_PER_CSTATE=()
+		DRBD_PER_ROLE_REMOTE=()
+		DRBD_PER_DSTATE_REMOTE=()
+
+		: == DEBUG == _volume== ${_volume} ==
+		: == DEBUG == DRBD_ROLE_LOCAL== ${DRBD_ROLE_LOCAL[${_volume}]} ==
+		: == DEBUG == DRBD_DSTATE_LOCAL  == ${DRBD_DSTATE_LOCAL[${_volume}]} ==
+		: == DEBUG == DRBD_CSTATE== ${DRBD_CSTATE[${_volume}]} ==
+		: == DEBUG == DRBD_ROLE_REMOTE   == ${DRBD_ROLE_REMOTE[${_volume}]} ==
+		: == DEBUG == DRBD_DSTATE_REMOTE == ${DRBD_DSTATE_REMOTE[${_volume}]} ==
+	else
+		DRBD_CSTATE[$_volume]=$_cstate
+		DRBD_ROLE_REMOTE[$_volume]=${_peer:-Unknown}
+		DRBD_DSTATE_REMOTE[$_volume]=${_pdsk:-DUnknown}
+	fi
 }
+
 drbd_set_status_variables() {
 	# drbdsetup sh-status prints these values to stdout,
 	# and then prints _sh_status_process.
@@ -352,6 +394,15 @@ drbd_set_status_variables() {
 	local _resynced_percent
 	local out
 
+	if $DRBD_VERSION_9 ; then
+		local _peer_node_id _conn_name
+		DRBD_PER_NAME=()
+		DRBD_PER_ID=()
+		DRBD_PER_CSTATE=()
+		DRBD_PER_ROLE_REMOTE=()
+		DRBD_PER_DSTATE_REMOTE=()
+	fi
+
 	DRBD_ROLE_LOCAL=()
 	DRBD_ROLE_REMOTE=()
 	DRBD_CSTATE=()
@@ -369,16 +420,20 @@ drbd_set_status_variables() {
 	# if there was no output at all, or a weird output
 	# make sure the status arrays won't be empty.
 	[[ ${#DRBD_ROLE_LOCAL[@]}!= 0 ]] || DRBD_ROLE_LOCAL=(Unconfigured)
-	[[ ${#DRBD_ROLE_REMOTE[@]}   != 0 ]] || DRBD_ROLE_REMOTE=(Unknown)
-	[[ ${#DRBD_CSTATE[@]}!= 0 ]] || DRBD_CSTATE=(Unconfigured)
 	[[ ${#DRBD_DSTATE_LOCAL[@]}  != 0 ]] || DRBD_DSTATE_LOCAL=(Unconfigured)
+	[[ ${#DRBD_CSTATE[@]}!= 0 ]] || DRBD_CSTATE=(Unconfigured)
+	[[ ${#DRBD_ROLE_REMOTE[@]}   != 0 ]] || DRBD_ROLE_REMOTE=(Unknown)
 	[[ ${#DRBD_DSTATE_REMOTE[@]} != 0 ]] || DRBD_DSTATE_REMOTE=(DUnknown)
 
-
+	if $DRBD_VERSION_9 ; then
+		: == DEBUG == DRBD_NAME== ${DRBD_NAME[@]} ==
+		: == DEBUG == DRBD_ID== ${DRBD_ID[@]} ==
+		: == DEBUG == DRBD_VOLUME== ${DRBD_VOLUME[@]} ==
+	fi
 	: == DEBUG == DRBD_ROLE_LOCAL== ${DRBD_ROLE_LOCAL[@]} ==
-	: == DEBUG == DRBD_ROLE_REMOTE   == ${DRBD_ROLE_REMOTE[@]} ==
-	: == DEBUG == DRBD_CSTATE== ${DRBD_CSTATE[@]} ==
 	: == DEBUG == DRBD_DSTATE_LOCAL  == ${DRBD_DSTATE_LOCAL[@]} ==
+	: == DEBUG == DRBD_CSTATE== ${DRBD_CSTATE[@]} ==
+	: == DEBUG == DRBD_ROLE_REMOTE   == ${DRBD_ROLE_REMOTE[@]} ==
 	: == DEBUG == DRBD_DS

Re: [DRBD-user] Drbd/pacemaker active/passive san failover

2016-09-20 Thread Igor Cicimov

On Wed, Sep 21, 2016 at 1:17 AM, Marco Marino  wrote:

> As told by Lars Ellenberg, one first problem with the configuration
> http://pastebin.com/r3N1gzwx
> is that on-io-error should be
> on-io-error call-local-io-error;
>

And in your specific case that would have shut down both servers since both
had io-error. Don't see how could that help.


> and not detach. Furthermore, in the configuration there is also another
> error:
> fencing should be
> fencing resource-and-stonith;
> and not resource-only.
>

Only if you have fencing configured in Pacemaker. Do you?


>
> But I don't understand (again) why the secondary node becomes diskless 
> (UpToDate -> Failed and then Failed -> Diskless).
>
> I'd like to do one (stupid) example: if I have 2 nodes with 1 disk for each 
> node used as backend for a drbd resource and one of these disks fails, 
> nothing should happen on the secondary node.
>
> Igor Cicimov: why removing the write-back cache drive on the primary node 
> cause problems also on the secondary node? What is the dynamics involved?
>
> As Lars pointed out it is up to you to figure it out by examining the logs
and your setup. One possible reason is that there was inflight data that
was flushed from the cache and replicated to the secondary when you removed
the ssd and the secondary received corrupt stream that could not be written
to the disk. It is also possible that you already had a problem on the
secondary which went diskless even *before* you created the issue on the
primary. Comparing the timestamps in both servers logs should tell you if
that was the case.

> However, root file system is not part of the CacheCade virtual drive and yes, 
> one possible solution could be create a mirror of ssd drives for CacheCade. 
> But I'm using drbd/pacemaker because
> in a similar situation I need to switch resources automatically on the other 
> node.
>
>
>
>
> 2016-09-20 13:12 GMT+02:00 Igor Cicimov :
>
>> On Tue, Sep 20, 2016 at 7:13 PM, Marco Marino 
>> wrote:
>>
>>> mmm... This means that I do not understood this policy. I thought that
>>> I/O error happens only on the primary node, but it seems that all nodes
>>> become diskless in this case. Why? Basically I have an I/O error on the
>>> primary node because I removed wrongly the ssd (cachecade) disk. Why also
>>> the secondary node is affected??
>>>
>>
>> The problem is as I see it that when the io-error happened on the
>> secondary the disk was not UpToDate any more:
>>
>> Sep  7 19:55:19 iscsi2 kernel: block drbd1: disk( *UpToDate -> Failed* )
>>
>> in which case it can not be promoted to primary. I don't think what ever
>> policy you had in those handlers it would had made any difference in your
>> case. By removing the write-back cache drive in the mid of operation you
>> caused damage on both ends. Even if you had any chance by force, would you
>> really want to promote a secondary that has a corrupt data to primary at
>> this point?
>>
>> You might try the call-local-io-error option as suggested by Lars or even
>> the pass_on and let the file system handle it. You should also take
>> Digimer's suggestion and let Pacemaker take care of everything since you
>> have it already installed so why not use it. You need proper functioning
>> fencing though in that case.
>>
>> As someone else suggested you should also remove the root file system
>> from the CacheCade virtual drive (just an assumption but looks like that is
>> the case). Creating a mirror of SSD drives for the CacheCade is also an
>> option to avoid similar accidents in the future (what is the chance that
>> someone removes 2 drives in the same time??). And finally putting a "DON'T
>> REMOVE" sticker on the drive might work if nothing else does :-D
>>
>>
>>> And furthermore, using
>>>
>>> local-io-error "/usr/lib/drbd/notify-io-error.sh; 
>>> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; 
>>> halt -f";
>>>
>>> will be shut down both nodes? and again, should I remove on-io-error 
>>> detach; if I use local-io-error?
>>>
>>> Thank you
>>>
>>>
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Drbd/pacemaker active/passive san failover

2016-09-20 Thread Igor Cicimov

On Tue, Sep 20, 2016 at 7:13 PM, Marco Marino  wrote:

> mmm... This means that I do not understood this policy. I thought that I/O
> error happens only on the primary node, but it seems that all nodes become
> diskless in this case. Why? Basically I have an I/O error on the primary
> node because I removed wrongly the ssd (cachecade) disk. Why also the
> secondary node is affected??
>

The problem is as I see it that when the io-error happened on the secondary
the disk was not UpToDate any more:

Sep  7 19:55:19 iscsi2 kernel: block drbd1: disk( *UpToDate -> Failed* )

in which case it can not be promoted to primary. I don't think what ever
policy you had in those handlers it would had made any difference in your
case. By removing the write-back cache drive in the mid of operation you
caused damage on both ends. Even if you had any chance by force, would you
really want to promote a secondary that has a corrupt data to primary at
this point?

You might try the call-local-io-error option as suggested by Lars or even
the pass_on and let the file system handle it. You should also take
Digimer's suggestion and let Pacemaker take care of everything since you
have it already installed so why not use it. You need proper functioning
fencing though in that case.

As someone else suggested you should also remove the root file system from
the CacheCade virtual drive (just an assumption but looks like that is the
case). Creating a mirror of SSD drives for the CacheCade is also an option
to avoid similar accidents in the future (what is the chance that someone
removes 2 drives in the same time??). And finally putting a "DON'T REMOVE"
sticker on the drive might work if nothing else does :-D

> And furthermore, using
>
> local-io-error "/usr/lib/drbd/notify-io-error.sh; 
> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; 
> halt -f";
>
> will be shut down both nodes? and again, should I remove on-io-error detach; 
> if I use local-io-error?
>
> Thank you
>
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Drbd/pacemaker active/passive san failover

2016-09-20 Thread Igor Cicimov

On 20 Sep 2016 5:00 pm, "Marco Marino"  wrote:
>
> Furthermore there are logs from the secondary node:
>
> http://pastebin.com/A2ySXDCB
>
>
> Please compare time. It seems that also on the secondary node drbd goes
to diskless mode. Why?
>
In the secondary log you can see I/O errors too:

Sep  7 19:55:19 iscsi2 kernel: end_request: I/O error, dev sdb, sector
685931856
Sep  7 19:55:19 iscsi2 kernel: block drbd1: write: error=-5 s=685931856s
Sep  7 19:55:19 iscsi2 kernel: block drbd1: disk( UpToDate -> Failed )
Sep  7 19:55:19 iscsi2 kernel: block drbd1: Local IO failed in
drbd_endio_write_sec_final. Detaching...

and since your policy is:

disk {
on-io-error detach;
}

thats what drbd did. No disk => no master.

>
>
> 2016-09-20 8:44 GMT+02:00 Marco Marino :
>>
>> Hi, logs can be found here: http://pastebin.com/BGR33jN6
>>
>> @digimer:
>> Using local-io-error should power off the node and switch the cluster on
the remaing node is this a good idea?
>>
>> Regards,
>> Marco
>>
>> 2016-09-19 12:58 GMT+02:00 Adam Goryachev :
>>>
>>>
>>>
>>> On 19/09/2016 19:06, Marco Marino wrote:
>>>>
>>>>
>>>>
>>>> 2016-09-19 10:50 GMT+02:00 Igor Cicimov :
>>>>>
>>>>> On 19 Sep 2016 5:45 pm, "Marco Marino"  wrote:
>>>>> >
>>>>> > Hi, I'm trying to build an active/passive cluster with drbd and
pacemaker for a san. I'm using 2 nodes with one raid controller (megaraid)
on each one. Each node has an ssd disk that works as cache for read (and
write?) realizing the CacheCade proprietary tecnology.
>>>>> >
>>>>> Did you configure the CacheCade? If the write cache was enabled in
write-back mode then suddenly removing the device from under the controller
would have caused serious problems I guess since the controller expects to
write to the ssd cache firts and then flush to the hdd's. Maybe this
explains the read only mode?
>>>>
>>>> Good point. It is exactly as you wrote. How can I mitigate this
behavior in a clustered (active/passive) enviroment??? As I told in the
other post, I think the best solution is to poweroff the node using
local-io-error and switch all resources on the other node But please
give me some suggestions
>>>>
>>>
>>>>
>>>>>
>>>>> > Basically, the structure of the san is:
>>>>> >
>>>>> > Physycal disks -> RAID -> Device /dev/sdb in the OS -> Drbd
resource (that use /dev/sdb as backend) (using pacemaker with a
master/slave resource) -> VG (managed with pacemaker) -> Iscsi target (with
pacemaker) -> Iscsi LUNS (one for each logical volume in the VG, managed
with pacemaker)
>>>>> >
>>>>> > Few days ago, the ssd disk was wrongly removed from the primary
node of the cluster and this caused a lot of problems: drbd resource and
all logical volumes went in readonly mode with a lot of I/O errors but the
cluster did not switched to the other node. All filesystem on initiators
went to readonly mode. There are 2 problems involved here (I think): 1) Why
removing the ssd disk cause a readonly mode with I/O errors? This means
that the ssd is a single point of failure for a single node san with
megaraid controllers and CacheCade tecnology. and 2) Why drbd not
worked as espected?
>>>>> What was the state in /proc/drbd ?
>>>>
>>>>
>>> I think you will need to examine the logs to find out what happened. It
would appear (just making a wild guess) that either the cache is happening
between DRBD and iSCSI instead of between DRBD and RAID. If it happened
under DRBD then DRBD should see the read/write error, and should
automatically fail the local storage. It wouldn't necessarily failover to
the secondary, but it would do all read/write from the secondary node. The
fact this didn't happen makes it look like the failure happened above DRBD.
>>>
>>> At least that is my understanding of how it will work in that scenario.
>>>
>>> Regards,
>>> Adam
>>>
>>> ___
>>> drbd-user mailing list
>>> drbd-user@lists.linbit.com
>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>>
>>
>
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Drbd/pacemaker active/passive san failover

2016-09-19 Thread Igor Cicimov

On 19 Sep 2016 5:45 pm, "Marco Marino"  wrote:
>
> Hi, I'm trying to build an active/passive cluster with drbd and pacemaker
for a san. I'm using 2 nodes with one raid controller (megaraid) on each
one. Each node has an ssd disk that works as cache for read (and write?)
realizing the CacheCade proprietary tecnology.
>
Did you configure the CacheCade? If the write cache was enabled in
write-back mode then suddenly removing the device from under the controller
would have caused serious problems I guess since the controller expects to
write to the ssd cache firts and then flush to the hdd's. Maybe this
explains the read only mode?

> Basically, the structure of the san is:
>
> Physycal disks -> RAID -> Device /dev/sdb in the OS -> Drbd resource
(that use /dev/sdb as backend) (using pacemaker with a master/slave
resource) -> VG (managed with pacemaker) -> Iscsi target (with pacemaker)
-> Iscsi LUNS (one for each logical volume in the VG, managed with
pacemaker)
>
> Few days ago, the ssd disk was wrongly removed from the primary node of
the cluster and this caused a lot of problems: drbd resource and all
logical volumes went in readonly mode with a lot of I/O errors but the
cluster did not switched to the other node. All filesystem on initiators
went to readonly mode. There are 2 problems involved here (I think): 1) Why
removing the ssd disk cause a readonly mode with I/O errors? This means
that the ssd is a single point of failure for a single node san with
megaraid controllers and CacheCade tecnology. and 2) Why drbd not
worked as espected?
What was the state in /proc/drbd ?

> For point 1) I'm checking with the vendor and I doubt that I can do
something
> For point 2) I have errors in the drbd configuration. My idea is that
when an I/O error happens on the primary node, the cluster should switch to
the secondary node and shut down the damaged node.
> Here -> http://pastebin.com/79dDK66m it is possible to see the actual
drbd configuration, but I need to change a lot of things and I want to
share my ideas here:
>
> 1) The "handlers" section should be moved in the "common" section of
global_common.conf and not in the resource file.
>
> 2)I'm thinking to modify the "handlers" section as follow:
>
> handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
halt -f";   # Hook into Pacemaker's fencing. fence-peer
"/usr/lib/drbd/crm-fence-peer.sh"; }
>
>
> In this way, when an I/O error happens, the node will be powered off and
pacemaker will switch resources to the other node (or at least doesn't
create problematic behaviors...)
>
> 3) I'm thinking to move the "fencing" directive from the resource to the
global_common.conf file. Furthermore, I want to change it to
>
> fencing resource-and-stonith;
>
>
> 4) Finally, in the global "net" section I need to add:
>
> after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
>
> At the end of the work configuration will be ->
http://pastebin.com/r3N1gzwx
>
> Please, give me suggestion about mistakes and possible changes.
>
> Thank you
>
>
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Proxmox VE 4.2 and DRBD 9

2016-09-05 Thread Igor Cicimov

e export vm-100-disk-2
> which will recreate the configuration file in /var/lib/drbd.d for use with
> drbdadm
>
> This command will also implicitly start the drbdmanage server if it is not
> yet started (otherwise, drbdmanage startup can be used to start the
> drbdmanage server, which also starts all resources managed by drbdmanage).
>
> Apart from that, since the resource appears to be running, but is marked
> as Outdated and StandAlone, there might also be something wrong with the
> resource itself. If the resource does not reconnect/resync after a 'drbdadm
> adjust' command, the system log should be checked for messages regarding
> the state of that resource (such as e.g. a split-brain alert)
>
> --
> Robert Altnoeder
> DRBD - Corosync - Pacemaker
> +43 (1) 817 82 92 - 0 <43181782920>
> robert.altnoe...@linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
>
>
> ___
> drbd-user mailing 
> listdrbd-user@lists.linbit.comhttp://lists.linbit.com/mailman/listinfo/drbd-user
>
>
> --
> Bien cordialement, Jean-Daniel TISSOT
> <http://chrono-environnement.univ-fcomte.fr/spip.php?article457>
> Administrateur Systèmes et Réseaux
> Tel: +33 3 81 666 440 Fax: +33 3 81 666 568
>
> Laboratoire Chrono-environnement
> <http://chrono-environnement.univ-fcomte.fr/>
> 16, Route de Gray
> 25030 BESANÇON Cédex
>
> Plan et Accès
> <https://mapsengine.google.com/map/viewer?mid=zjsxW4ZzZPLY.kp2qPHUBD45c>
>
>
> --
> Bien cordialement, Jean-Daniel TISSOT
> <http://chrono-environnement.univ-fcomte.fr/spip.php?article457>
> Administrateur Systèmes et Réseaux
> Tel: +33 3 81 666 440 Fax: +33 3 81 666 568
>
> Laboratoire Chrono-environnement
> <http://chrono-environnement.univ-fcomte.fr/>
> 16, Route de Gray
> 25030 BESANÇON Cédex
>
> Plan et Accès
> <https://mapsengine.google.com/map/viewer?mid=zjsxW4ZzZPLY.kp2qPHUBD45c>
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>


-- 
Igor Cicimov | DevOps


p. +61 (0) 433 078 728
e. ig...@encompasscorporation.com <http://encompasscorporation.com/>
w*.* www.encompasscorporation.com
a. Level 4, 65 York Street, Sydney 2000
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] oracle stop timeout while drbd resync

2016-08-31 Thread Igor Cicimov

On Thu, Sep 1, 2016 at 9:02 AM, Igor Cicimov  wrote:

> On 1 Sep 2016 1:16 am, "Mia Lueng"  wrote:
> >
> > Yes, Oracle & drbd is running under pacemaker just in
> > primary/secondary mode. I stopped the oracle resource during DRBD is
> > resyncing and the oracle hangup
> >
> > 2016-08-31 14:38 GMT+08:00 Igor Cicimov  >:
> > >
> > >
> > > On Wed, Aug 31, 2016 at 3:49 PM, Mia Lueng 
> wrote:
> > >>
> > >> Hi:
> > >> I have a cluster with four drbd devices. I found oracle stopped
> > >> timeout while drbd is in resync state.
> > >> oracle is blocked like following:
> > >>
> > >> oracle6869  6844  0.0  0.0 71424 12616 ?S16:28
> > >> 00:00:00 pipe_wait
> > >> /oracle/app/oracle/dbhome_1/bin/sqlplus
> > >> @/tmp/ora_ommbb_shutdown.sql
> > >> oracle6870  6869  0.0  0.1 4431856 26096 ?   Ds   16:28
> > >> 00:00:00 get_write_access oracleommbb
> > >> (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
> > >>
> > >>
> > >> drbd state
> > >>
> > >> 2016-08-30 16:33:32 Dump [/proc/drbd] ...
> > >> =
> > >> version: 8.3.16 (api:88/proto:86-97)
> > >> GIT-hash: bbf851ee755a878a495cfd93e1a76bf90dc79442 Makefile.in build
> > >> by drbd@build 2012-06-07 16:03:04
> > >> 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
> r-
> > >>   ns:2777568 nr:0 dw:492604 dr:3305833 al:4761 bm:439 lo:31 pe:613
> > >> ua:0 ap:31 ep:1 wo:d oos:4144796
> > >>[==>.] sync'ed: 35.7% (4044/6280)M
> > >>finish: 0:10:19 speed: 6,680 (3,664) K/sec
> > >> 1: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent B
> r-
> > >>   ns:3709600 nr:0 dw:854764 dr:7632085 al:7689 bm:3401 lo:38 pe:3299
> > >> ua:38 ap:0 ep:1 wo:d oos:6204676
> > >>[===>] sync'ed: 41.5% (6056/10340)M
> > >>finish: 0:22:14 speed: 4,640 (10,016) K/sec
> > >> 2: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
> r-
> > >>   ns:3968883 nr:0 dw:127937 dr:5179641 al:190 bm:304 lo:1 pe:139 ua:0
> > >> ap:7 ep:1 wo:d oos:2124792
> > >>[>...] sync'ed: 66.3% (2072/6144)M
> > >>finish: 0:06:12 speed: 5,692 (6,668) K/sec
> > >> 3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
> r-
> > >>   ns:89737 nr:0 dw:439073 dr:2235186 al:724 bm:35 lo:0 pe:45 ua:0 ap:7
> > >> ep:1 wo:d oos:8131104
> > >>[>] sync'ed:  1.6% (7940/8064)M
> > >>finish: 10:44:09 speed: 208 (204) K/sec (stalled)
> > >>
> > >> Is this a known bug and fixed in the further version?
> > >> ___
> > >> drbd-user mailing list
> > >> drbd-user@lists.linbit.com
> > >> http://lists.linbit.com/mailman/listinfo/drbd-user
> > >
> > >
> > > Maybe provide more details about the term "cluster" you are using. Do
> you
> > > have DRBD under control of crm like Pacemaker? If so are you running
> DRBD in
> > > dual primary mode maybe? And when does this state happen and under what
> > > conditions i.e restarted a node etc.
>
> What os is this on? Can you please paste the output of "crm status" (or
> pcs if you are on rhel7) and "crm_mon -Qrf1"
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
Another thing I forgot  I find it odd that the sync for only one of the
devices is stalled. Are they all using the same replication link? Any
networking issues or network card errors you can see?
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] oracle stop timeout while drbd resync

2016-08-31 Thread Igor Cicimov

On 1 Sep 2016 9:02 am, "Igor Cicimov" 
wrote:
>
> On 1 Sep 2016 1:16 am, "Mia Lueng"  wrote:
> >
> > Yes, Oracle & drbd is running under pacemaker just in
> > primary/secondary mode. I stopped the oracle resource during DRBD is
> > resyncing and the oracle hangup
> >
> > 2016-08-31 14:38 GMT+08:00 Igor Cicimov :
> > >
> > >
> > > On Wed, Aug 31, 2016 at 3:49 PM, Mia Lueng 
wrote:
> > >>
> > >> Hi:
> > >> I have a cluster with four drbd devices. I found oracle stopped
> > >> timeout while drbd is in resync state.
> > >> oracle is blocked like following:
> > >>
> > >> oracle6869  6844  0.0  0.0 71424 12616 ?S16:28
> > >> 00:00:00 pipe_wait
> > >> /oracle/app/oracle/dbhome_1/bin/sqlplus
> > >> @/tmp/ora_ommbb_shutdown.sql
> > >> oracle6870  6869  0.0  0.1 4431856 26096 ?   Ds   16:28
> > >> 00:00:00 get_write_access oracleommbb
> > >> (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
> > >>
> > >>
> > >> drbd state
> > >>
> > >> 2016-08-30 16:33:32 Dump [/proc/drbd] ...
> > >> =
> > >> version: 8.3.16 (api:88/proto:86-97)
> > >> GIT-hash: bbf851ee755a878a495cfd93e1a76bf90dc79442 Makefile.in build
> > >> by drbd@build 2012-06-07 16:03:04
> > >> 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
r-
> > >>   ns:2777568 nr:0 dw:492604 dr:3305833 al:4761 bm:439 lo:31 pe:613
> > >> ua:0 ap:31 ep:1 wo:d oos:4144796
> > >>[==>.] sync'ed: 35.7% (4044/6280)M
> > >>finish: 0:10:19 speed: 6,680 (3,664) K/sec
> > >> 1: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent B
r-
> > >>   ns:3709600 nr:0 dw:854764 dr:7632085 al:7689 bm:3401 lo:38 pe:3299
> > >> ua:38 ap:0 ep:1 wo:d oos:6204676
> > >>[===>] sync'ed: 41.5% (6056/10340)M
> > >>finish: 0:22:14 speed: 4,640 (10,016) K/sec
> > >> 2: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
r-
> > >>   ns:3968883 nr:0 dw:127937 dr:5179641 al:190 bm:304 lo:1 pe:139 ua:0
> > >> ap:7 ep:1 wo:d oos:2124792
> > >>[>...] sync'ed: 66.3% (2072/6144)M
> > >>finish: 0:06:12 speed: 5,692 (6,668) K/sec
> > >> 3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B
r-
> > >>   ns:89737 nr:0 dw:439073 dr:2235186 al:724 bm:35 lo:0 pe:45 ua:0
ap:7
> > >> ep:1 wo:d oos:8131104
> > >>[>] sync'ed:  1.6% (7940/8064)M
> > >>finish: 10:44:09 speed: 208 (204) K/sec (stalled)
> > >>
> > >> Is this a known bug and fixed in the further version?
> > >> ___
> > >> drbd-user mailing list
> > >> drbd-user@lists.linbit.com
> > >> http://lists.linbit.com/mailman/listinfo/drbd-user
> > >
> > >
> > > Maybe provide more details about the term "cluster" you are using. Do
you
> > > have DRBD under control of crm like Pacemaker? If so are you running
DRBD in
> > > dual primary mode maybe? And when does this state happen and under
what
> > > conditions i.e restarted a node etc.
>
> What os is this on? Can you please paste the output of "crm status" (or
pcs if you are on rhel7) and "crm_mon -Qrf1"

Also look for errors from crm in syslog and check oracle log too for errors.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] oracle stop timeout while drbd resync

2016-08-31 Thread Igor Cicimov

On 1 Sep 2016 1:16 am, "Mia Lueng"  wrote:
>
> Yes, Oracle & drbd is running under pacemaker just in
> primary/secondary mode. I stopped the oracle resource during DRBD is
> resyncing and the oracle hangup
>
> 2016-08-31 14:38 GMT+08:00 Igor Cicimov :
> >
> >
> > On Wed, Aug 31, 2016 at 3:49 PM, Mia Lueng  wrote:
> >>
> >> Hi:
> >> I have a cluster with four drbd devices. I found oracle stopped
> >> timeout while drbd is in resync state.
> >> oracle is blocked like following:
> >>
> >> oracle6869  6844  0.0  0.0 71424 12616 ?S16:28
> >> 00:00:00 pipe_wait
> >> /oracle/app/oracle/dbhome_1/bin/sqlplus
> >> @/tmp/ora_ommbb_shutdown.sql
> >> oracle6870  6869  0.0  0.1 4431856 26096 ?   Ds   16:28
> >> 00:00:00 get_write_access oracleommbb
> >> (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
> >>
> >>
> >> drbd state
> >>
> >> 2016-08-30 16:33:32 Dump [/proc/drbd] ...
> >> =
> >> version: 8.3.16 (api:88/proto:86-97)
> >> GIT-hash: bbf851ee755a878a495cfd93e1a76bf90dc79442 Makefile.in build
> >> by drbd@build 2012-06-07 16:03:04
> >> 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
> >>   ns:2777568 nr:0 dw:492604 dr:3305833 al:4761 bm:439 lo:31 pe:613
> >> ua:0 ap:31 ep:1 wo:d oos:4144796
> >>[==>.] sync'ed: 35.7% (4044/6280)M
> >>finish: 0:10:19 speed: 6,680 (3,664) K/sec
> >> 1: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent B
r-
> >>   ns:3709600 nr:0 dw:854764 dr:7632085 al:7689 bm:3401 lo:38 pe:3299
> >> ua:38 ap:0 ep:1 wo:d oos:6204676
> >>[===>] sync'ed: 41.5% (6056/10340)M
> >>finish: 0:22:14 speed: 4,640 (10,016) K/sec
> >> 2: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
> >>   ns:3968883 nr:0 dw:127937 dr:5179641 al:190 bm:304 lo:1 pe:139 ua:0
> >> ap:7 ep:1 wo:d oos:2124792
> >>[>...] sync'ed: 66.3% (2072/6144)M
> >>finish: 0:06:12 speed: 5,692 (6,668) K/sec
> >> 3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
> >>   ns:89737 nr:0 dw:439073 dr:2235186 al:724 bm:35 lo:0 pe:45 ua:0 ap:7
> >> ep:1 wo:d oos:8131104
> >>[>] sync'ed:  1.6% (7940/8064)M
> >>finish: 10:44:09 speed: 208 (204) K/sec (stalled)
> >>
> >> Is this a known bug and fixed in the further version?
> >> ___
> >> drbd-user mailing list
> >> drbd-user@lists.linbit.com
> >> http://lists.linbit.com/mailman/listinfo/drbd-user
> >
> >
> > Maybe provide more details about the term "cluster" you are using. Do
you
> > have DRBD under control of crm like Pacemaker? If so are you running
DRBD in
> > dual primary mode maybe? And when does this state happen and under what
> > conditions i.e restarted a node etc.

What os is this on? Can you please paste the output of "crm status" (or pcs
if you are on rhel7) and "crm_mon -Qrf1"
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] oracle stop timeout while drbd resync

2016-08-31 Thread Igor Cicimov

On Wed, Aug 31, 2016 at 3:49 PM, Mia Lueng  wrote:

> Hi:
> I have a cluster with four drbd devices. I found oracle stopped
> timeout while drbd is in resync state.
> oracle is blocked like following:
>
> oracle6869  6844  0.0  0.0 71424 12616 ?S16:28
> 00:00:00 pipe_wait
> /oracle/app/oracle/dbhome_1/bin/sqlplus
> @/tmp/ora_ommbb_shutdown.sql
> oracle6870  6869  0.0  0.1 4431856 26096 ?   Ds   16:28
> 00:00:00 get_write_access oracleommbb
> (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
>
>
> drbd state
>
> 2016-08-30 16:33:32 Dump [/proc/drbd] ...
> =
> version: 8.3.16 (api:88/proto:86-97)
> GIT-hash: bbf851ee755a878a495cfd93e1a76bf90dc79442 Makefile.in build
> by drbd@build 2012-06-07 16:03:04
> 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
>   ns:2777568 nr:0 dw:492604 dr:3305833 al:4761 bm:439 lo:31 pe:613
> ua:0 ap:31 ep:1 wo:d oos:4144796
>[==>.] sync'ed: 35.7% (4044/6280)M
>finish: 0:10:19 speed: 6,680 (3,664) K/sec
> 1: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent B r-
>   ns:3709600 nr:0 dw:854764 dr:7632085 al:7689 bm:3401 lo:38 pe:3299
> ua:38 ap:0 ep:1 wo:d oos:6204676
>[===>] sync'ed: 41.5% (6056/10340)M
>finish: 0:22:14 speed: 4,640 (10,016) K/sec
> 2: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
>   ns:3968883 nr:0 dw:127937 dr:5179641 al:190 bm:304 lo:1 pe:139 ua:0
> ap:7 ep:1 wo:d oos:2124792
>[>...] sync'ed: 66.3% (2072/6144)M
>finish: 0:06:12 speed: 5,692 (6,668) K/sec
> 3: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent B r-
>   ns:89737 nr:0 dw:439073 dr:2235186 al:724 bm:35 lo:0 pe:45 ua:0 ap:7
> ep:1 wo:d oos:8131104
>[>] sync'ed:  1.6% (7940/8064)M
>finish: 10:44:09 speed: 208 (204) K/sec (stalled)
>
> Is this a known bug and fixed in the further version?
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>

Maybe provide more details about the term "cluster" you are using. Do you
have DRBD under control of crm like Pacemaker? If so are you running DRBD
in dual primary mode maybe? And when does this state happen and under what
conditions i.e restarted a node etc.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD 9.1 ???

2016-07-14 Thread Igor Cicimov

On 15 Jul 2016 9:30 am, "Digimer"  wrote:
>
> On 14/07/16 07:10 PM, Igor Cicimov wrote:
> > Ok, this has been coming for a while now, does anyone know when is the
> > expected 9.1 release date?
>
> 9.0.3 just came out... So far as I know, nothing has been said about a
> release schedule.
>
> What do you mean by "this has been coming for a while now"?
>
I mean stable production ready version 9 with all the goodies announced,
like multiple active/active nodes etc., which should be version 9.1. Right??

> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] DRBD 9.1 ???

2016-07-14 Thread Igor Cicimov

Ok, this has been coming for a while now, does anyone know when is the
expected 9.1 release date?
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbd 8.4-7-1: Slow rsync rate when crossing WAN network

2016-07-01 Thread Igor Cicimov

On 1 Jul 2016 3:48 pm, "T.J. Yang"  wrote:
>
> Hi All
>
> I am new to drbd performance turning and I have been study (R0).
> Aso I am browsing others effort in drbd-user archive (R1).
> I was able to get 350MB/s rsync rate (R2) for two Centos 7.2 VMs(A and B)
when they are on same LAN with turning from (R1) thread.
>
> My goal is to have C Centos 7.2 VM paired with A VM that go over a fast
 WAN pipe(R3). but when I reuse B's drbd config and change the IP info. The
rsync rate back to 30-40M rsync rate(See R4).
>
> I tried the jumbo frame turning to raise MTU from 1500 to 8000(RH Support
article recommend 8000, not 9000). But this change on VM A and C doesn't
not improve rsync rate.
>
>
> Does networking team  need do any change on their swith/router  for drbd
case ?
>
You need jumbo frames enabled on the switch too.

>
> References:
> R0: https://www.drbd.org/en/doc/users-guide-84/p-performance
> R1: http://lists.linbit.com/pipermail/drbd-user/2016-January/022611.html
> R2:
>
> #this is between vmA(10.65.184.1) and vmB(10.65.184.3), same subnet.
>
> [root@vmA ~]# ./drbd-pm-test.bash wandk0 # script from R0
>
> testing wandk0 on /dev/drbd1
>
> 1+0 records in
>
> 1+0 records out
>
> 536870912 bytes (537 MB) copied, 8.00925 s, 67.0 MB/s
>
> 1+0 records in
>
> 1+0 records out
>
> 536870912 bytes (537 MB) copied, 1.72338 s, 312 MB/s
>
> 1+0 records in
>
> 1+0 records out
>
> 536870912 bytes (537 MB) copied, 1.84181 s, 291 MB/s
>
> 1+0 records in
>
> 1+0 records out
>
> 536870912 bytes (537 MB) copied, 1.66079 s, 323 MB/s
>
> 1+0 records in
>
> 1+0 records out
>
> 536870912 bytes (537 MB) copied, 1.69359 s, 317 MB/s
>
> testing wandk0 on backing device:/dev/centos/wandk0
>
> 1+0 records in
>
> 1+0 records out
>
> 536870912 bytes (537 MB) copied, 0.504366 s, 1.1 GB/s
>
> 1+0 records in
>
> 1+0 records out
>
> 536870912 bytes (537 MB) copied, 0.550144 s, 976 MB/s
>
> 1+0 records in
>
> 1+0 records out
>
> 536870912 bytes (537 MB) copied, 0.502675 s, 1.1 GB/s
>
> 1+0 records in
>
> 1+0 records out
>
> 536870912 bytes (537 MB) copied, 0.473032 s, 1.1 GB/s
>
> 1+0 records in
>
> 1+0 records out
>
> 536870912 bytes (537 MB) copied, 0.470139 s, 1.1 GB/s
>
> [root@vmA ~]#
>
>
>
> R3:
>
> [root@vmC ~]# iperf3 -s -p 5900
>
> warning: this system does not seem to support IPv6 - trying IPv4
>
> ---
>
> Server listening on 5900
>
> ---
>
> Accepted connection from 10.65.184.1(vmA), port 56750
>
> [  5] local 10.64.5.245 port 5900 connected to 10.65.184.1 port 56754
>
> [ ID] Interval   Transfer Bandwidth
>
> [  5]   0.00-1.00   sec  73.9 MBytes   620 Mbits/sec
>
> [  5]   1.00-2.00   sec   100 MBytes   842 Mbits/sec
>
> [  5]   2.00-3.00   sec   107 MBytes   895 Mbits/sec
>
> [  5]   3.00-4.00   sec   113 MBytes   947 Mbits/sec
>
> [  5]   4.00-5.00   sec   117 MBytes   984 Mbits/sec
>
> [  5]   5.00-6.00   sec   120 MBytes  1.01 Gbits/sec
>
> [  5]   6.00-7.00   sec   123 MBytes  1.03 Gbits/sec
>
> [  5]   7.00-8.00   sec   124 MBytes  1.04 Gbits/sec
>
> [  5]   8.00-9.00   sec   124 MBytes  1.04 Gbits/sec
>
> [  5]   9.00-10.00  sec   125 MBytes  1.04 Gbits/sec
>
> [  5]  10.00-10.04  sec  5.25 MBytes  1.25 Gbits/sec
>
> - - - - - - - - - - - - - - - - - - - - - - - - -
>
> [ ID] Interval   Transfer Bandwidth
>
> [  5]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec  sender
>
> [  5]   0.00-10.04  sec  1.11 GBytes   947 Mbits/sec
 receiver
>
> ---
>
> Server listening on 5900
>
> ---
>
> ^Ciperf3: interrupt - the server has terminated
>
> [root@vmC ~]# date
>
> Wed Jun 29 12:26:37 EDT 2016
>
> [root@vmC ~]#
>
>
>
> R4: only getting 35MB/s when cross WAN network.
>
> [root@vmA ~]# ./scratch-test.bash wandk0 # from R1
>
> 1: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-
>
> Writing via wandk0 on /dev/drbd1
>
> 1+0 records in
>
> 1+0 records out
>
> 536870912 bytes (537 MB) copied, 15.1859 s, 35.4 MB/s
>
> 
>
> 536870912 bytes (537 MB) copied, 15.598 s, 34.4 MB/s
>
> 1+0 records in
>
> 1+0 records out
>
> 536870912 bytes (537 MB) copied, 16.9145 s, 31.7 MB/s
>
> Writing directly into backing device:/dev/centos/wandk0
>
> 1+0 records in
>
> 1+0 records out
>
> 536870912 bytes (537 MB) copied, 0.625202 s, 859 MB/s
>
> 
>
> 1+0 records in
>
> 1+0 records out
>
> 536870912 bytes (537 MB) copied, 0.566128 s, 948 MB/s
>
> [root@vmA ~]# date
>
> Thu Jun 30 13:07:16 EDT 2016
>
> [root@vmA ~]#
>
>
>
> --
> T.J. Yang
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Performance issues with drbd and nfs

2016-06-10 Thread Igor Cicimov

On 7 Jun 2016 3:18 pm, "Stephano-Shachter, Dylan" 
wrote:
>
> Hello all,
>
> I am building an HA NFS server using drbd and pacemaker. Everything is
working well except I am getting lower write speeds than I would expect. I
have been doing all of my benchmarking with bonnie++. I always get read
speeds of about 112 MB/s which is just about saturating the network. When I
perform a write, however, I get about 89 MB/s which is significantly slower.
>
> The weird thing is that if I run the test locally, on the server (not
using nfs), I get 112 MB/s read. Also, if I run the tests over nfs but with
the secondary downed via "drbdadm down name", then I also get 112 MB/s.

This is confusing, you are just saying that the reads are same in case of
drbd and nfs and without. Or you meant writes here? What does locally mean?
Different partition without drbd? Or drbd without nfs? Nothing in drbd is
local it is block level replicated storage.

I can't understand what is causing the bottleneck if it is not drbd
replication or nfs.
>

How exactly are you testing and what is the physical disk, meaning raid or
not? Is this a virtual or bare metal server?
The reads are faster due to caching so did you account for that in your
read test, ie reading a file at least twice the ram size?

Not exactly an answer just trying to get some more info about your setup.

> If anyone could help me to figure out what is slowing down the write
performance if would be very helpful. My configs are
>
>
> drbd-config-
>
>
> # /etc/drbd.conf
> global {
> usage-count yes;
> cmd-timeout-medium 600;
> cmd-timeout-long 0;
> }
>
> common {
> net {
> protocol   C;
> after-sb-0pridiscard-zero-changes;
> after-sb-1pridiscard-secondary;
> after-sb-2pridisconnect;
> max-buffers  8000;
> max-epoch-size   8000;
> }
> disk {
> resync-rate  1024M;
> }
> handlers {
> pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
> pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
> local-io-error   "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
halt -f";
> split-brain  "/usr/lib/drbd/notify-split-brain.sh root";
> }
> }
>
> # resource  on : not ignored, not stacked
> # defined at /etc/drbd.d/.res:1
> resource  {
> on  {
> device   /dev/drbd1 minor 1;
> disk /dev/sdb1;
> meta-diskinternal;
> address  ipv4 55.555.55.55:7789;
> }
> on  {
> device   /dev/drbd1 minor 1;
> disk /dev/sdb1;
> meta-diskinternal;
> address  ipv4 55.555.55.55:7789;
> }
> net {
> allow-two-primaries  no;
> after-sb-0pridiscard-zero-changes;
> after-sb-1pridiscard-secondary;
> after-sb-2pridisconnect;
> }
> }
>
>
>
> ---nfs.conf-
>
>
>
> MOUNTD_NFS_V3="yes"
> RPCNFSDARGS="-N 2"
> LOCKD_TCPPORT=32803
> LOCKD_UDPPORT=32769
> MOUNTD_PORT=892
> RPCNFSDCOUNT=48
> #RQUOTAD_PORT=875
> #STATD_PORT=662
> #STATD_OUTGOING_PORT=2020
> STATDARG="--no-notify"
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD 9 Peack CPU load

2016-06-03 Thread Igor Cicimov

   if
> (connection->ctx.ctx_peer_node_id != peer_device->ctx.ctx_peer_node_id
> +   || device->ctx.ctx_volume
> != peer_device->ctx.ctx_volume)
> +   continue;
> +   printI("_conn_name=%s\n",
> connection->ctx.ctx_conn_name);
> +   printI("_peer_node_id=%d\n",
> connection->ctx.ctx_peer_node_id);
> +   printI("_cstate=%s\n",
> drbd_conn_str(connection->info.conn_connection_state));
> +   if
> (connection->info.conn_connection_state == C_CONNECTED) {
> +   printI("_peer=%s\n",
> drbd_role_str(connection->info.conn_role));
> +   printI("_pdsk=%s\n\n",
> drbd_disk_str(peer_device->info.peer_disk_state));
> +   } else {
> +   printI("_peer=\n");
> +   printI("_pdsk=\n");
> +   }
> +   wrap_printf(0,
> "_peer_node_process\n\n");
> +   }
> +   //Dummy
> +   //printI("_flags_susp==%s\n", xxx);
> +   //...
> +   --indent;
> +   }
> +
> +   wrap_printf(0, "_sh_status_process\n\n");
> +   --indent;
> +   }
> +
> +   free_connections(connections);
> +   free_devices(devices);
> +   free_peer_devices(peer_devices);
> +   }
> +
> +   free(resources_list);
> +   objname = old_objname;
> +   return 0;
> +}
> +
>  static int cstate_cmd(struct drbd_cmd *cm, int argc, char **argv)
>  {
> struct connections_list *connections, *connection;
>
>
>
>
> 30 maj 2016 kl. 13:36 skrev Igor Cicimov :
>
>
>
> On Tue, May 17, 2016 at 8:21 AM, Mats Ramnefors 
> wrote:
>
>> I am testing a DRBD 9 and 8.4 in simple 2 node active - passive clusters
>> with NFS.
>>
>> Copying files form a third server to the NFS share using dd, I typically
>> see an average of 20% CPU load (with v9) on the primary during transfer of
>> larger files, testing with 0,5 and 2 GB.
>>
>> At the very end of the transfer DRBD process briefly peaks at 70 - 100%
>> CPU.
>>
>> This causes occasional problems with Corosync believing the node is down.
>> Increasing the token time for Corosync to 2000 ms fixes the symptom but I
>> am wondering about the root cause and any possible fixes?
>>
>> This is the DRBD configuration.
>>
>> resource san_data {
>>   protocol C;
>>   meta-disk internal;
>>   device /dev/drbd1;
>>   disk   /dev/nfs/share;
>>   net {
>> verify-alg sha1;
>> cram-hmac-alg sha1;
>> shared-secret ”";
>> after-sb-0pri discard-zero-changes;
>> after-sb-1pri discard-secondary;
>> after-sb-2pri disconnect;
>>   }
>>   on san1 {
>> address  192.168.1.86:7789;
>>   }
>>   on san2 {
>> address  192.168.1.87:7789;
>>   }
>> }
>>
>> The nodes are two VM on different ESXi hosts (Dell T620). Hosts are very
>> lightly loaded. Network is 1 Gb at the moment through a Catalyst switch.
>> Network appears not saturated.
>>
>> BTW when can we expect a DRBD resource agent for v9? It took me a while
>> to figure out why DRBD9 was not working with Pacemaker and then finding a
>> patch to the agent :)
>>
>> Cheers Mats
>> ___
>> drbd-user mailing list
>> drbd-user@lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
> Hi Mats,
>
> Can you please share the patch if you don't mind?
>
> Thanks,
> Igor
>
>
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD 9 Peack CPU load

2016-05-30 Thread Igor Cicimov

On Tue, May 17, 2016 at 8:21 AM, Mats Ramnefors  wrote:

> I am testing a DRBD 9 and 8.4 in simple 2 node active - passive clusters
> with NFS.
>
> Copying files form a third server to the NFS share using dd, I typically
> see an average of 20% CPU load (with v9) on the primary during transfer of
> larger files, testing with 0,5 and 2 GB.
>
> At the very end of the transfer DRBD process briefly peaks at 70 - 100%
> CPU.
>
> This causes occasional problems with Corosync believing the node is down.
> Increasing the token time for Corosync to 2000 ms fixes the symptom but I
> am wondering about the root cause and any possible fixes?
>
> This is the DRBD configuration.
>
> resource san_data {
>   protocol C;
>   meta-disk internal;
>   device /dev/drbd1;
>   disk   /dev/nfs/share;
>   net {
> verify-alg sha1;
> cram-hmac-alg sha1;
> shared-secret ”";
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
>   }
>   on san1 {
> address  192.168.1.86:7789;
>   }
>   on san2 {
> address  192.168.1.87:7789;
>   }
> }
>
> The nodes are two VM on different ESXi hosts (Dell T620). Hosts are very
> lightly loaded. Network is 1 Gb at the moment through a Catalyst switch.
> Network appears not saturated.
>
> BTW when can we expect a DRBD resource agent for v9? It took me a while to
> figure out why DRBD9 was not working with Pacemaker and then finding a
> patch to the agent :)
>
> Cheers Mats
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user


Hi Mats,

Can you please share the patch if you don't mind?

Thanks,
Igor
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD9 - ASSERTION FAILED: pp_in_use

2016-04-23 Thread Igor Cicimov

Did you read this?
https://www.drbd.org/en/doc/users-guide-90/s-enable-dual-primary
On 24 Apr 2016 6:03 am, "Ml Ml"  wrote:

> Hello List,
>
> i am running DRBD9 with 4.2.8-1-pve (Proxmox 4.1) and in syslog i get:
>
> Apr 23 21:50:07 node02 kernel: [ 5018.813223] drbd vm-101-disk-1
> node01: ASSERTION FAILED: pp_in_use: -12 < 0
> Apr 23 21:50:07 node02 kernel: [ 5018.813379] drbd vm-104-disk-1
> node01: ASSERTION FAILED: pp_in_use: -2 < 0
> Apr 23 21:50:07 node02 kernel: [ 5018.913334] drbd vm-101-disk-1
> node01: ASSERTION FAILED: pp_in_use: -12 < 0
> Apr 23 21:50:07 node02 kernel: [ 5018.913509] drbd vm-104-disk-1
> node01: ASSERTION FAILED: pp_in_use: -2 < 0
> Apr 23 21:50:07 node02 kernel: [ 5019.013381] drbd vm-101-disk-1
> node01: ASSERTION FAILED: pp_in_use: -12 < 0
>
> And this quite a lot:
> grep "ASSERTION FAILED" -c /var/log/syslog
> 74076
>
> Any idea whats broken here?
> Do you need any more infos?
>
> Thanks,
> Mario
>
>
> # /etc/drbd.conf
> global {
> usage-count yes;
> }
>
> common {
> }
>
> # resource .drbdctrl on node02: not ignored, not stacked
> # defined at /etc/drbd.d/drbdctrl.res:1
> resource .drbdctrl {
> volume 0 {
> device   minor 0;
> disk /dev/drbdpool/.drbdctrl_0;
> meta-diskinternal;
> }
> volume 1 {
> device   minor 1;
> disk /dev/drbdpool/.drbdctrl_1;
> meta-diskinternal;
> }
> on node02 {
> node-id 1;
> address  ipv4 192.168.113.2:6999;
> }
> on node03 {
> node-id 2;
> address  ipv4 192.168.113.3:6999;
> }
> on node01 {
> node-id 0;
> address  ipv4 192.168.113.1:6999;
> }
> connection-mesh {
> hosts node02 node03 node01;
> net {
> protocol   C;
> }
> }
> net {
> cram-hmac-algsha256;
> shared-secrettza2VB63DEJWCsk/kZY6;
> allow-two-primaries  no;
> }
> }
>
> # resource vm-100-disk-1 on node02: not ignored, not stacked
> # defined at /var/lib/drbd.d/drbdmanage_vm-100-disk-1.res:2
> resource vm-100-disk-1 {
> template-file /var/lib/drbd.d/drbdmanage_global_common.conf;
> on node02 {
> node-id 0;
> volume 0 {
> disk {
> size 78643200k;
> }
> device   minor 101;
> disk /dev/drbdpool/vm-100-disk-1_00;
> meta-diskinternal;
> }
> address  ipv4 192.168.113.2:7001;
> }
> on node03 {
> node-id 1;
> volume 0 {
> disk {
> size 78643200k;
> }
> device   minor 101;
> disk /dev/drbdpool/vm-100-disk-1_00;
> meta-diskinternal;
> }
> address  ipv4 192.168.113.3:7001;
> }
> on node01 {
> node-id 2;
> volume 0 {
> disk {
> size 78643200k;
> }
> device   minor 101;
> disk /dev/drbdpool/vm-100-disk-1_00;
> meta-diskinternal;
> }
> address  ipv4 192.168.113.1:7001;
> }
> connection-mesh {
> hosts node02 node03 node01;
> }
> net {
> allow-two-primaries yes;
> shared-secret"moxL2b+nErKykEfUGi2Z";
> cram-hmac-algsha1;
> }
> }
>
> # resource vm-101-disk-1 on node02: not ignored, not stacked
> # defined at /var/lib/drbd.d/drbdmanage_vm-101-disk-1.res:2
> resource vm-101-disk-1 {
> template-file /var/lib/drbd.d/drbdmanage_global_common.conf;
> on node02 {
> node-id 0;
> volume 0 {
> disk {
> size 209715200k;
> }
> device   minor 102;
> disk /dev/drbdpool/vm-101-disk-1_00;
> meta-diskinternal;
> }
> address  ipv4 192.168.113.2:7002;
> }
> on node03 {
> node-id 1;
> volume 0 {
> disk {
> size 209715200k;
> }
> device   minor 102;
> disk /dev/drbdpool/vm-101-disk-1_00;
> meta-diskinternal;
> }
> address  ipv4 192.168.113.3:7002;
> }
> on node01 {
> node-id 2;
> volume 0 {
> disk {
> size 209715200k;
> }
> device   minor 102;
> disk /dev/drbdpool/vm-101-disk-1_00;
> meta-diskinternal;
> }
> address  ipv4 192.168.113.1:7002;
> }
> connection-mesh {
> hosts node02 node03 node01;
> }
>
> net {
> max-epoch-size   8000;
> max-buffers  8000;
> shared-secretY5g0NB8QKFcs1R5p7XAA;
> allow-two-primaries yes;
> sndbuf-size0;
>

Re: [DRBD-user] Having Trouble with LVM on DRBD

2016-02-29 Thread Igor Cicimov

On 01/03/2016 3:08 AM, "Eric Robinson"  wrote:
>
> >> That approach does not really work because if you stop resource
> >> p_mysql_002 (for example) then all the other resources in the group
stop too!
> >>
> > Still dont understand whats your problem with that.
>
> Each mysql instance is for a different customer. If I need to stop or
restart one customer's database, or remove the resource from the group
(perhaps because I lost the customer or I need to move their database to a
different server for load distribution), then I don't all the other
customers' databases going down too!

It's just a matter of design, the way you have it setup atm will not work
of course. If you have them set as in my example:

group g_drbd0 p_lvm_drbd0 p_fs_clust1 p_vip_clust1 p_mysql_001
group g_drbd1 p_lvm_drbd1 p_fs_clust2 p_vip_clust2 p_mysql_002
.
.
.
group g_drbd99 p_lvm_drbd99 p_fs_clust100 p_vip_clust100 p_mysql_100

then each mysql instance gets its own VIP instead of shared one as in your
case. Then during migration you move the group and the backing drbd device
will get promoted to master on the other node and the whole group will
follow.

There are many ways to skin the cat.

>
> --Eric
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Connect problem with sdp when using DRBD8.4

2016-02-28 Thread Igor Cicimov

On 29/02/2016 2:00 AM, "翟果"  wrote:
>
> Hello,All:
> I used to google for the solution,but get no answers.
> Somebody says DRBD8.4 doesn't work with sdp?Really?
As far as I can see thats not true:
http://drbd.linbit.com/users-guide-8.4/s-replication-transports.html

> I have two nodes(Centos 6.4 with kernel 2.6.32-358.el6.x86_64),and
two Mellanox IB cards.Now,I want to use DRBD to sync data between two
nodes.But the DRBD status is always "WFconnect".
>
> [root@node1 home]# cat /proc/drbd
> version: 8.4.2 (api:1/proto:86-101)
> GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by
root@localhost.localdomain, 2013-09-30 16:29:29
>  0: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/Outdated C rs
> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
oos:10485404
>
>
> I have loaded the ib_sdp.ko and pre_loaded lib_sdp.so
>
> [root@node36 ofa_kernel-3.1]# lsmod | grep sdp
> ib_sdp129946  0
> rdma_cm43237  2 rdma_ucm,ib_sdp
> ib_core   126865  13
rdma_ucm,ib_ucm,ib_sdp,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_uverbs,ib_umad,mlx5_ib,mlx4_ib,ib_sa,ib_mad
> mlx_compat 32626  18
rdma_ucm,ib_ucm,ib_sdp,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_uverbs,ib_umad,mlx5_ib,mlx5_core,mlx4_en,mlx4_ib,ib_sa,ib_mad,ib_core,ib_addr,mlx4_core
> ipv6  321422  41
ib_sdp,ib_ipoib,ib_core,ib_addr,ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6
>
> [root@node1 ofa_kernel-3.1]# export
> declare -x G_BROKEN_FILENAMES="1"
> declare -x HISTCONTROL="ignoredups"
> declare -x HISTSIZE="1000"
> declare -x HOME="/root"
> declare -x HOSTNAME="node1"
> declare -x LANG="en_US.UTF-8"
> declare -x LD_PRELOAD="/usr/lib64/libsdp.so"
> declare -x LESSOPEN="|/usr/bin/lesspipe.sh %s"
> declare -x LIBSDP_CONFIG_FILE="/etc/libsdp.conf"
> declare -x LOGNAME="root"
>
> Here is part of my configure:
> resource res_sdp {
> meta-disk internal;
> disk /dev/mapper/p1-lv1;
> device /dev/drbd0;
>
> on node1 {
> address sdp *:7789;
> }
> on node36 {
> address sdp *:7789;
> }
> }
>
> And I get some log of ib_sdp.ko.
>
> Feb 28 15:24:20 node1 kernel: sdp_cma_handler:657 sdp_sock( 1465:0
58262:7789): event: RDMA_CM_EVENT_CONNECT_RESPONSE handled
> Feb 28 15:24:20 node1 kernel: sdp_cma_handler:671 sdp_sock( 1465:0
58262:7789): event: RDMA_CM_EVENT_CONNECT_RESPONSE done. status 0
> Feb 28 15:24:27 node1 kernel: sdp_cma_handler:438 sdp_sock( 1465:0
7789:0): event: RDMA_CM_EVENT_CONNECT_REQUEST
> Feb 28 15:24:27 node1 kernel: sdp_connect_handler:178 sdp_sock( 1465:0
7789:0): sdp_connect_handler 88086ee7ac00 -> 88086ef41000
> Feb 28 15:24:27 node1 kernel: sdp_init_sock:1325 sdp_sock( 1465:0
7789:0): sdp_init_sock
> Feb 28 15:24:27 node1 kernel: sdp_init_qp:111 sdp_sock( 1465:0
7789:41102): sdp_init_qp
> Feb 28 15:24:27 node1 kernel: sdp_init_qp:114 sdp_sock( 1465:0
7789:41102): Max sges: 32
> Feb 28 15:24:27 node1 kernel: sdp_init_qp:117 sdp_sock( 1465:0
7789:41102): Setting max send sge to: 9
> Feb 28 15:24:27 node1 kernel: sdp_init_qp:120 sdp_sock( 1465:0
7789:41102): Setting max recv sge to: 9
> Feb 28 15:24:27 node1 kernel: sdp_init_qp:151 sdp_sock( 1465:0
7789:41102): sdp_init_qp done
> Feb 28 15:24:27 node1 kernel: _sdp_exch_state:559 sdp_sock( 1465:0
7789:41102): sdp_connect_handler:300 - set state: TCP_LISTEN ->
TCP_SYN_RECV 0x480
> Feb 28 15:24:27 node1 kernel: sdp_cma_handler:657 sdp_sock( 1465:0
7789:0): event: RDMA_CM_EVENT_CONNECT_REQUEST handled
> Feb 28 15:24:27 node1 kernel: sdp_cma_handler:671 sdp_sock( 1465:0
7789:0): event: RDMA_CM_EVENT_CONNECT_REQUEST done. status 0
> Feb 28 15:24:28 node1 kernel: sdp_cma_handler:438 sdp_sock( 1465:0
58262:7789): event: RDMA_CM_EVENT_DISCONNECTED
> Feb 28 15:24:28 node1 kernel: _sdp_exch_state:559 sdp_sock( 1465:0
58262:7789): sdp_set_error:591 - set state: TCP_ESTABLISHED -> TCP_CLOSE
0x
> Feb 28 15:24:28 node1 kernel: sdp_disconnected_handler:400 sdp_sock(
1465:0 58262:7789): sdp_disconnected_handler
> Feb 28 15:24:28 node1 kernel: sdp_cma_handler:657 sdp_sock( 1465:0
58262:7789): event: RDMA_CM_EVENT_DISCONNECTED handled
> Feb 28 15:24:28 node1 kernel: sdp_reset_sk:492 sdp_sock( 1465:0
58262:7789): sdp_reset_sk
> Feb 28 15:24:28 node1 kernel: sdp_reset_sk:501 sdp_sock( 1465:0
58262:7789): setting state to error
> Feb 28 15:24:28 node1 kernel: _sdp_exch_state:559 sdp_sock( 1465:0
58262:7789): sdp_set_error:591 - set state: TCP_CLOSE -> TCP_CLOSE
0x
> Feb 28 15:24:28 node1 kernel: sdp_cma_handler:671 sdp_sock( 1465:0
58262:7789): event: RDMA_CM_EVENT_DISCONNECTED done. status -104
> Feb 28 15:24:28 node1 kernel: sdp_destroy_work:1238 sdp_sock( 1615:6
58262:7789): sdp_destroy_work: refcnt 2
> Feb 28 15:24:28 node1 kernel: sdp_do_posts:816 sdp_sock( 1614:11
58262:7789): QP is deactivated
> Feb 28 15:24:28 node1 kernel: sdp_do_posts:816 sdp_sock( 1614:11
58262:7789): QP is deactivated
> Feb 28 15:24:28 node1 kernel: sdp_destroy_qp:242 sdp_sock( 1615:6
58262:

Re: [DRBD-user] Having Trouble with LVM on DRBD

2016-02-28 Thread Igor Cicimov

On Mon, Feb 29, 2016 at 7:52 AM, Igor Cicimov  wrote:

>
> On 28/02/2016 1:19 PM, "Eric Robinson"  wrote:
> >
> > > That's exactly what this configuration gives you right? Each group is
> collocated
> > > with one and only one drbd device on the master node. Regarding
> starting/stopping of
> > > the resources tied up together in the same group. I guess after adding
> MySQL
> > > the user case would be:
> >
> > >group g_drbd0 p_lvm_drbd0 p_fs_clust17 p_vip_clust17 p_mysql_001
> p_mysql_002 ...
> >
> > That approach does not really work because if you stop resource
> p_mysql_002 (for example) then all the other resources in the group stop
> too!
> >
> Still dont understand whats your problem with that.
>
> > >> And what was wrong with my constraints?
> > > You had left the Filesystem out of the picture.
> >
> > You're right. I added the filesystem back into the picture and now it
> works either way, with my original constraints or yours, at least for drbd,
> lvm, the filesystem, and the VIP. However, it still does not work for the
> mysql resources. As soon as I add the mysql resources, then when I fail
> over, everything goes to crap.
> >
> > On my other clusters, my constrains look like the following and they
> work great. I can stop and start any mysql resource without affecting any
> other resource, but all the mysql resources are still dependent on the
> underlying ones (drbd, fs, vip).
> >
> > Here's a working sample from one of my other clusters...
> >
> > colocation c_clust10 inf: ( p_mysql_001 p_mysql_003 p_mysql_043
> p_mysql_075 p_mysql_092 ) p_vip_clust10 p_fs_clust10 ms_drbd0:Master
>
Ah the VIP of course as I suspected, yep you can't use groups in this case.

> > colocation c_clust11 inf: ( p_mysql_124 p_mysql_098 p_mysql_287
> p_mysql_346 p_mysql_685 ) p_vip_clust11 p_fs_clust11 ms_drbd1:Master
> > order o_clust10 inf: ms_drbd0:promote p_fs_clust10 p_vip_clust10 (
> p_mysql_001 p_mysql_003 p_mysql_043 p_mysql_075 p_mysql_092 )
> > order o_clust11 inf: ms_drbd1:promote p_fs_clust11 p_vip_clust11 (
> p_mysql_124 p_mysql_098 p_mysql_287 p_mysql_346 p_mysql_685 )
> >
> > However, that did not work with this newer cluster. It looks like there
> has been a syntactical change in the CRM. The following approach does work.
> Note the different usage of parenthesis.
> >
> > colocation c_clust17 inf: ( p_mysql_557 p_mysql_690 p_vip_clust17
> p_fs_clust17 p_lvm_drbd0 ) ms_drbd0:Master
> > colocation c_clust18 inf: ( p_vip_clust18 p_fs_clust18 p_lvm_drbd1 )
> ms_drbd1:Master
> > order o_clust17 inf: ms_drbd0:promote ( p_lvm_drbd0:start ) (
> p_fs_clust17 p_vip_clust17 ) ( p_mysql_557 p_mysql_690 )
> > order o_clust18 inf: ms_drbd1:promote ( p_lvm_drbd1:start ) (
> p_fs_clust18 p_vip_clust18 )
> >
> > --Eric
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Having Trouble with LVM on DRBD

2016-02-28 Thread Igor Cicimov

On 28/02/2016 1:19 PM, "Eric Robinson"  wrote:
>
> > That's exactly what this configuration gives you right? Each group is
collocated
> > with one and only one drbd device on the master node. Regarding
starting/stopping of
> > the resources tied up together in the same group. I guess after adding
MySQL
> > the user case would be:
>
> >group g_drbd0 p_lvm_drbd0 p_fs_clust17 p_vip_clust17 p_mysql_001
p_mysql_002 ...
>
> That approach does not really work because if you stop resource
p_mysql_002 (for example) then all the other resources in the group stop
too!
>
Still dont understand whats your problem with that.

> >> And what was wrong with my constraints?
> > You had left the Filesystem out of the picture.
>
> You're right. I added the filesystem back into the picture and now it
works either way, with my original constraints or yours, at least for drbd,
lvm, the filesystem, and the VIP. However, it still does not work for the
mysql resources. As soon as I add the mysql resources, then when I fail
over, everything goes to crap.
>
> On my other clusters, my constrains look like the following and they work
great. I can stop and start any mysql resource without affecting any other
resource, but all the mysql resources are still dependent on the underlying
ones (drbd, fs, vip).
>
> Here's a working sample from one of my other clusters...
>
> colocation c_clust10 inf: ( p_mysql_001 p_mysql_003 p_mysql_043
p_mysql_075 p_mysql_092 ) p_vip_clust10 p_fs_clust10 ms_drbd0:Master
> colocation c_clust11 inf: ( p_mysql_124 p_mysql_098 p_mysql_287
p_mysql_346 p_mysql_685 ) p_vip_clust11 p_fs_clust11 ms_drbd1:Master
> order o_clust10 inf: ms_drbd0:promote p_fs_clust10 p_vip_clust10 (
p_mysql_001 p_mysql_003 p_mysql_043 p_mysql_075 p_mysql_092 )
> order o_clust11 inf: ms_drbd1:promote p_fs_clust11 p_vip_clust11 (
p_mysql_124 p_mysql_098 p_mysql_287 p_mysql_346 p_mysql_685 )
>
> However, that did not work with this newer cluster. It looks like there
has been a syntactical change in the CRM. The following approach does work.
Note the different usage of parenthesis.
>
> colocation c_clust17 inf: ( p_mysql_557 p_mysql_690 p_vip_clust17
p_fs_clust17 p_lvm_drbd0 ) ms_drbd0:Master
> colocation c_clust18 inf: ( p_vip_clust18 p_fs_clust18 p_lvm_drbd1 )
ms_drbd1:Master
> order o_clust17 inf: ms_drbd0:promote ( p_lvm_drbd0:start ) (
p_fs_clust17 p_vip_clust17 ) ( p_mysql_557 p_mysql_690 )
> order o_clust18 inf: ms_drbd1:promote ( p_lvm_drbd1:start ) (
p_fs_clust18 p_vip_clust18 )
>
> --Eric
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Having Trouble with LVM on DRBD

2016-02-26 Thread Igor Cicimov

On Sat, Feb 27, 2016 at 5:05 PM, Igor Cicimov  wrote:

>
> On 27/02/2016 4:10 pm, "Eric Robinson"  wrote:
> >
> > > Can you please try following constraints instead the ones you have:
> >
> > > group g_drbd0 p_lvm_drbd0 p_fs_clust17 p_vip_clust17
> > > group g_drbd1 p_lvm_drbd1 p_fs_clust18 p_vip_clust18
> > > colocation c_clust17 inf: g_drbd0 ms_drbd0:Master
> > > colocation c_clust18 inf: g_drbd1 ms_drbd1:Master
> > > order o_clust17 inf: ms_drbd0:promote g_drbd0:start
> > > order o_clust18 inf: ms_drbd1:promote g_drbd1:start
> >
> > Those constraints seem to be working great. I can move the g_drbd0 and
> g_drbd1 resource groups freely back and forth between the nodes. However,
> this creates a problem for me because there will be a number of mysql
> resources on the servers (p_mysql_001, p_mysql_002, p_mysql_003,
> p_mysql_004, etc.).  In the past when I used resource groups, stopping any
> one mysql resource stopped all of the others, and removing one mysql
> resource from the group did the same thing. That's why I stopped using
> groups.
> >
> That's exactly what this configuration gives you right? Each group is
> collocated with one and only one drbd device on the master node.
>
Regarding starting/stopping of the resources tied up together in the same
group. I guess after adding MySQL the user case would be:

group g_drbd0 p_lvm_drbd0 p_fs_clust17 p_vip_clust17 p_mysql_001

Well all resources in the group are dedicated to the mysql resource so
stopping them together with mysql should not be of any issue at all. Apart
from the fact they have to be started again, the ip the lvm and the fs,
together with mysql which takes a second to get executed in case you are
concern about any latency. What's the purpose of those resources running
without their top mysql resource anyway?

However, if the mysql VIP in the group is being used by something else on
the server then yes you might have a problem with this. But as long as you
keep everything grouped and isolated I don't see what might go wrong.

> > I don't mind going back to using groups for the p_lvm, p_fs and p_vip
> resources, but the end result must be this: each of the mysql resources
> should be dependent on one of the g_drbd resource groups, but  independent
> of all the other mysql resources. What syntax should I use for that?
> >
> > And what was wrong with my constraints?
> You had left the Filesystem out of the picture. If you look carefully you
> will see that really the only difference between mine and yours
> constraints, apart from groups, is the filesystem primitives.
>
> Those work on my other clusters (but none of the others have LVM on drbd).
> >
> > --Eric
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Having Trouble with LVM on DRBD

2016-02-26 Thread Igor Cicimov

On 27/02/2016 4:10 pm, "Eric Robinson"  wrote:
>
> > Can you please try following constraints instead the ones you have:
>
> > group g_drbd0 p_lvm_drbd0 p_fs_clust17 p_vip_clust17
> > group g_drbd1 p_lvm_drbd1 p_fs_clust18 p_vip_clust18
> > colocation c_clust17 inf: g_drbd0 ms_drbd0:Master
> > colocation c_clust18 inf: g_drbd1 ms_drbd1:Master
> > order o_clust17 inf: ms_drbd0:promote g_drbd0:start
> > order o_clust18 inf: ms_drbd1:promote g_drbd1:start
>
> Those constraints seem to be working great. I can move the g_drbd0 and
g_drbd1 resource groups freely back and forth between the nodes. However,
this creates a problem for me because there will be a number of mysql
resources on the servers (p_mysql_001, p_mysql_002, p_mysql_003,
p_mysql_004, etc.).  In the past when I used resource groups, stopping any
one mysql resource stopped all of the others, and removing one mysql
resource from the group did the same thing. That's why I stopped using
groups.
>
That's exactly what this configuration gives you right? Each group is
collocated with one and only one drbd device on the master node.

> I don't mind going back to using groups for the p_lvm, p_fs and p_vip
resources, but the end result must be this: each of the mysql resources
should be dependent on one of the g_drbd resource groups, but  independent
of all the other mysql resources. What syntax should I use for that?
>
> And what was wrong with my constraints?
You had left the Filesystem out of the picture. If you look carefully you
will see that really the only difference between mine and yours
constraints, apart from groups, is the filesystem primitives.

Those work on my other clusters (but none of the others have LVM on drbd).
>
> --Eric
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Having Trouble with LVM on DRBD

2016-02-26 Thread Igor Cicimov

On Sat, Feb 27, 2016 at 11:18 AM, Eric Robinson 
wrote:

> Sadly, it still isn’t working.
>
> Here is my crm config...
>
> node ha13a
> node ha13b
> primitive p_drbd0 ocf:linbit:drbd \
> params drbd_resource=ha01_mysql \
> op monitor interval=31s role=Slave \
> op monitor interval=30s role=Master
> primitive p_drbd1 ocf:linbit:drbd \
> params drbd_resource=ha02_mysql \
> op monitor interval=29s role=Slave \
> op monitor interval=28s role=Master
> primitive p_fs_clust17 Filesystem \
> params device="/dev/vg_drbd0/lv_drbd0" directory="/ha01_mysql"
> fstype=ext3 options=noatime
> primitive p_fs_clust18 Filesystem \
> params device="/dev/vg_drbd1/lv_drbd1" directory="/ha02_mysql"
> fstype=ext3 options=noatime
> primitive p_lvm_drbd0 LVM \
> params volgrpname=vg_drbd0
> primitive p_lvm_drbd1 LVM \
> params volgrpname=vg_drbd1
> primitive p_vip_clust17 IPaddr2 \
> params ip=192.168.9.104 cidr_netmask=32 \
> op monitor interval=30s
> primitive p_vip_clust18 IPaddr2 \
> params ip=192.168.9.105 cidr_netmask=32 \
> op monitor interval=30s
> ms ms_drbd0 p_drbd0 \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true target-role=Master
> ms ms_drbd1 p_drbd1 \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true target-role=Master
> colocation c_clust17 inf: p_vip_clust17 p_lvm_drbd0 ms_drbd0:Master
> colocation c_clust18 inf: p_vip_clust18 p_lvm_drbd1 ms_drbd1:Master
> order o_clust17 inf: ms_drbd0:promote p_lvm_drbd0:start p_vip_clust17
> order o_clust18 inf: ms_drbd1:promote p_lvm_drbd1:start p_vip_clust18
> property cib-bootstrap-options: \
> dc-version=1.1.11-97629de \
> cluster-infrastructure="classic openais (with plugin)" \
> no-quorum-policy=ignore \
> stonith-enabled=false \
> maintenance-mode=false \
> expected-quorum-votes=2 \
> last-lrm-refresh=1456529727
> # vim: set filetype=pcmk:
>
> Here is what my filter looks like...
>
> filter = [ "a|/dev/sda*|", "a|/dev/drbd*|", "r|.*|" ]
> write_cache_state = 0
> volume_list = [ "vg00", "vg_drbd0", "vg_drbd1" ]
>
>
> Here is what lvdisplay shows on node ha13a...
>
>  [root@ha13a ~]# lvdisplay
>   --- Logical volume ---
>   LV Path/dev/vg00/lv00
>   LV Namelv00
>   VG Namevg00
>   LV UUIDBfYyBv-VPNI-2f5s-0kVZ-AoSr-dGcY-gojAzs
>   LV Write Accessread/write
>   LV Creation host, time ha13a.mycharts.md, 2014-01-23 03:38:38 -0800
>   LV Status  available
>   # open 1
>   LV Size78.12 GiB
>   Current LE 2
>   Segments   1
>   Allocation inherit
>   Read ahead sectors auto
>   - currently set to 256
>   Block device   253:0
>
>   --- Logical volume ---
>   LV Path/dev/vg_drbd1/lv_drbd1
>   LV Namelv_drbd1
>   VG Namevg_drbd1
>   LV UUIDHLVYSz-mZbQ-rCUm-OMBg-a1G9-vqdg-FwRp5S
>   LV Write Accessread/write
>   LV Creation host, time ha13b, 2016-02-26 13:48:51 -0800
>   LV Status  NOT available
>   LV Size1.00 TiB
>   Current LE 262144
>   Segments   1
>   Allocation inherit
>   Read ahead sectors auto
>
>   --- Logical volume ---
>   LV Path/dev/vg_drbd0/lv_drbd0
>   LV Namelv_drbd0
>   VG Namevg_drbd0
>   LV UUID2q0e0v-P2g1-inu4-GKDN-cTyn-e2L7-jCJ1BY
>   LV Write Accessread/write
>   LV Creation host, time ha13a, 2016-02-26 13:48:06 -0800
>   LV Status  available
>   # open 1
>   LV Size1.00 TiB
>   Current LE 262144
>   Segments   1
>   Allocation inherit
>   Read ahead sectors auto
>   - currently set to 256
>   Block device   253:1
>
> And here is what it shows on ha13b...
>
> [root@ha13b ~]# lvdisplay
>   --- Logical volume ---
>   LV Path/dev/vg_drbd1/lv_drbd1
>   LV Namelv_drbd1
>   VG Namevg_drbd1
>   LV UUIDHLVYSz-mZbQ-rCUm-OMBg-a1G9-vqdg-FwRp5S
>   LV Write Accessread/write
>   LV Creation host, time ha13b, 2016-02-26 13:48:51 -0800
>   LV Status  available
>   # open 1
>   LV Size1.00 TiB
>   Current LE 262144
>   Segments   1
>   Allocation inherit
>   Read ahead sectors auto
>   - currently set to 256
>   Block device   253:1
>
>   --- Logical volume ---
>   LV Path/dev/vg_drbd0/lv_drbd0
>   LV Namelv_drbd0
>   VG Namevg_drbd0
>   LV UUID2q0e0v-P2g1-inu4-GKDN-cTyn-e2L7-jCJ1BY
>   LV Write Accessread/write
>

Re: [DRBD-user] Having Trouble with LVM on DRBD

2016-02-26 Thread Igor Cicimov

On 27/02/2016 9:44 am, "Eric Robinson"  wrote:
>
> In the example you provided…
>
> ...
> filter = [ "a|/dev/vd.*|", "a|/dev/drbd*|", "r|.*|" ]
> write_cache_state = 0
> volume_list = [ "rootvg", "vg1", "vg2" ]
> ...
>
> …it looks like you are accepting anything  that begins with '/dev/vd.' or
'/dev/drbd' and rejecting everything else.

That's correct.

Sorry for my dumb question, but if the goal is to filter out certain
devices so LVM won't grab them before drbd does, wouldn't you want to
reject them instead of accepting them?
>

Your root vg is on sda so you have to let lvm read it. Your cluster vg's
are on drbd devices so you tell lvm to read those too and NOT the
underlying block devices sd[bcdef...]

> Here is what I have written on my system...
>
> ...
> filter = [ "a|/dev/sda*|", "a|/dev/drbd*|", "r|.*|" ]
> write_cache_state = 0
> volume_list = [ "vg00", "vg_drbd0", "vg_drbd1" ]
> ...
>
Looks good.

> I have not rebooted yet because I am not sure this is correct.
> --
> Eric Robinson
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Having Trouble with LVM on DRBD

2016-02-25 Thread Igor Cicimov

On Fri, Feb 26, 2016 at 10:37 AM, Eric Robinson 
wrote:

> > Those are not the backing devices. Backing devices are the ones named
> > on the "disk " lines in your DRBD resource files - for example "disk
> /dev/vg/lv1".
>
> Sorry, for a second I was thinking of the drbd disks as backing devices
> for the LVM volumes that were created on them, but you're absolutely right.
>
> > So you need to exclude /dev/vg/lv1 (in that example) from being looked
> > at by LVM by filtering it out.
>
> On my servers, I have /dev/vg00/lv01 and /dev/vg00/lv02, which are the
> backing devices for /dev/drbd0 and /dev/drbd1 respectively. Then I made
> drbd0 and drbd1 into PVs, and created logical volume /dev/vg_drbd0/lv_drbd0
> on drbd0, and /dev/vg_drbd1/lv_drbd1 on drbd1.
>
> You're saying that I should filter /dev/vg00/lv01 and /dev/vg00/lv02 in
> lvm.conf? That won't have any undesirable side effects?
>
> --Eric
>
> Maybe this will help. In my case I have:

...
filter = [ "a|/dev/vd.*|", "a|/dev/drbd*|", "r|.*|" ]
write_cache_state = 0
volume_list = [ "rootvg", "vg1", "vg2" ]
...

so in your case, you need to replace */dev/vd.** with the *block device*
your *root volume* *vg00* lives on, ie /dev/sda or what ever device your
vg00 has been carved from.



>
>
>
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD9 drbdadm complains about fencing being in wrong section

2016-02-25 Thread Igor Cicimov

On Mon, Feb 22, 2016 at 9:36 PM, Igor Cicimov  wrote:

> Sorry, forgot to include the list in my reply.
>
>
> -- Forwarded message ------
> From: Igor Cicimov 
> Date: Mon, Feb 22, 2016 at 9:33 PM
> Subject: Re: [DRBD-user] DRBD9 drbdadm complains about fencing being in
> wrong section
> To: Roland Kammerer 
>
>
> Hi Roland,
>
> On Thu, Feb 18, 2016 at 8:03 PM, Roland Kammerer <
> roland.kamme...@linbit.com> wrote:
>
>> On Thu, Feb 18, 2016 at 10:37:58AM +1100, Igor Cicimov wrote:
>> > On Wed, Feb 17, 2016 at 8:18 PM, Roland Kammerer <
>> roland.kamme...@linbit.com
>> > > wrote:
>> >
>> > > On Wed, Feb 17, 2016 at 04:20:12PM +1100, Igor Cicimov wrote:
>> > > > Hi,
>> > > >
>> > > > I'm testing 9.0.1.1 installed from git and have a resource with
>> fencing
>> > > in
>> > > > the disk section:
>> > > >
>> > > > disk {
>> > > > on-io-error detach;
>> > > > fencing resource-and-stonith;
>> > > > }
>> > >
>> > > It belongs to net{}, and yes, the man page is outdated, I will fix
>> that.
>> > >
>> > >
>> >  Thanks Roland that worked. By the way I'm facing another issue when
>> > starting the service:
>> >
>> > # service drbd start
>> >  * Starting DRBD
>> > resources
>> > [
>> >  create res: vg1
>> >prepare disk: vg1
>> > adjust disk: vg1
>> > prepare net: vg1
>> > adjust peer_devices: vg1
>> > attempt to connect: vg1
>> > ]
>> > ...drbdadm: Unknown command 'sh-b-pri'
>>
>> Too old version of drbd-utils (which provides the init script)?
>>
>
> Nope, latest install from git as it can be seen in the output of the
> drbdadm version in my initial post.
>
> I'm also seeing another discrepancy between the utility and what is in the
> docs:
>
> # drbdadm status vg1 --verbose --statistics
> drbdadm: unrecognized option '--statistics'
> try 'drbdadm help'
>
> where '--statistics' is suppose to be a valid option according to the docs.
>
>
>> 'sh-b-pri' was dropped, see the corresponding comment in the init
>> script:
>> # Become primary if configured
>> # Currently, this is necessary for drbd8
>> # drbd9 supports automatic promote and removes the
>> # "sh-b-pri" command.
>> $DRBDADM sh-b-pri all || true
>>
>> >
>> > although drbdadm tells me that all is fine:
>> >
>> > # drbdadm status
>> > vg1 role:Primary
>> >   disk:UpToDate
>> >   drbd02 role:Primary
>> > peer-disk:UpToDate
>>
>> Looks good.
>>
>> >
>> > Now this is a problem since I can't set drbd under pacemaker control:
>> >
>> > Failed actions:
>> > p_drbd_vg1_start_0 (node=drbd01, call=9, rc=1, status=Timed Out,
>> > last-rc-change=Thu Feb 18 10:07:43 2016
>> > , queued=20004ms, exec=1ms
>> > ): unknown error
>> > p_drbd_vg1_start_0 (node=drbd02, call=9, rc=1, status=Timed Out,
>> > last-rc-change=Thu Feb 18 10:07:43 2016
>> > , queued=20005ms, exec=0ms
>> > ): unknown error
>> >
>> > I've been running this kind of setups with 8.4.4 with no issues. Has
>> > something changed around drbd management in version 9 or is it just
>> > something wrong with the initd script?
>>
>> AFAIK you should not mix the drbd service with pacemaker anyways. Choose
>> one, pacemaker. Pacemaker is really not my field of expertise, so
>> somebody else has to jump in here.
>>
>>
> I'm not mixing them drbd is set to not autostart and should be under
> pacemaker control...eventually.
>
>
>> > Finally thanks for the drbdmanage pointer I'll do some research about
>> it (I
>> > think I saw some presentation of yours on the web). I don't see it
>> > installed on my system (Ubuntu 14.04.4 LTS) so guess it is not packaged
>> > with drbd-utils but is a separate toll. I have a question though, since
>> I
>> > have done the config already manually, is the drbdmanage going to take
>> over
>> > based on already existing configuration or I will need to reconfigure
>> > everything all over again?
>>
>> It is a separate package and not part of drbd-utils. AFAIR downstream
>> Debian/Ubuntu did not pick it up yet.

Re: [DRBD-user] Having Trouble with LVM on DRBD

2016-02-25 Thread Igor Cicimov

On Fri, Feb 26, 2016 at 10:10 AM, Eric Robinson 
wrote:

> > The usual problem with LVM on top of DRBD is that the
>
> > backing device gets seen as an LVM PV and is grabbed by
>
> > LVM before DRBD starts up. That means DRBD cannot access
>
> > it since it's already in use. Solution: adjust the filter
>
> > lines in lvm.conf to exclude the backing devices from
>
> > LVM consideration.
>
> You mean /dev/drbd0 and /dev/drbd1?
>
>
>
> --Eric
>

The instructions are here:
https://drbd.linbit.com/users-guide/s-lvm-drbd-as-pv.html

Then in pacemaker, example from one of my clusters:

primitive p_lvm_vg1 LVM \
params volgrpname=vg1 \
op monitor interval=60 timeout=30 \
op start timeout=30 interval=0 \
op stop timeout=30 interval=0 \
meta target-role=Started
colocation c_vg1_on_drbd +inf: g_vg1 ms_drbd_vg1:Master
order o_drbd_before_vg1 +inf: ms_drbd_vg1:promote g_vg1:start

to tide them up and start the volume on the drbd master.

Hope this helps.

Igor
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Having Trouble with LVM on DRBD

2016-02-25 Thread Igor Cicimov

On 26/02/2016 9:51 AM, "Eric Robinson"  wrote:
>
> Ø  Im confused I don't see the VG(s) and LV(s) under cluster control have
you done that  bit?
>
> (blank stare)
>
> This is where I admit that I have no idea what you mean. I’ve been
building clusters with drbd for a decade, and I’ve always had drbd on top
of LVM and all has been well. This is the first time I have LVM on top of
drbd. What am I missing?
>
Well the file system lives on top of lvm so when moved it to the other node
it needs the vg/lv to follow too.

> --Eric
>
>
>
>
>
>
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Having Trouble with LVM on DRBD

2016-02-25 Thread Igor Cicimov

On 26/02/2016 8:53 AM, "Eric Robinson"  wrote:
>
> ‘> And your pacemaker config is???
>
> ‘> Run
>
> ‘> #  crm configure show
> ‘>and paste it here.
>
> Pacemaker 1.1.12.
>
>
>
> Here’s the config…
>
>
>
>
>
> [root@ha13a /]# crm configure show
>
> node ha13a
>
> node ha13b
>
> primitive p_drbd0 ocf:linbit:drbd \
>
> params drbd_resource=ha01_mysql \
>
> op monitor interval=31s role=Slave \
>
> op monitor interval=30s role=Master
>
> primitive p_drbd1 ocf:linbit:drbd \
>
> params drbd_resource=ha02_mysql \
>
> op monitor interval=29s role=Slave \
>
> op monitor interval=28s role=Master
>
> primitive p_fs_clust17 Filesystem \
>
> params device="/dev/vg_drbd0/lv_drbd0" directory="/ha01_mysql"
fstype=ext3 options=noatime
>
> primitive p_fs_clust18 Filesystem \
>
> params device="/dev/vg_drbd1/lv_drbd1" directory="/ha02_mysql"
fstype=ext3 options=noatime
>
> primitive p_vip_clust17 IPaddr2 \
>
> params ip=192.168.9.104 cidr_netmask=32 \
>
> op monitor interval=30s
>
> primitive p_vip_clust18 IPaddr2 \
>
> params ip=192.168.9.105 cidr_netmask=32 \
>
> op monitor interval=30s
>
> ms ms_drbd0 p_drbd0 \
>
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true target-role=Master
>
> ms ms_drbd1 p_drbd1 \
>
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true target-role=Master
>
> location cli-prefer-p_fs_clust17 p_fs_clust17 role=Started inf: ha13b
>
> colocation c_clust17 inf: p_vip_clust17 ms_drbd0:Master
>
> colocation c_clust18 inf: p_vip_clust18 ms_drbd1:Master
>
> order o_clust17 inf: ms_drbd0:promote p_vip_clust17
>
> order o_clust18 inf: ms_drbd1:promote p_vip_clust18
>
> property cib-bootstrap-options: \
>
> dc-version=1.1.11-97629de \
>
> cluster-infrastructure="classic openais (with plugin)" \
>
> no-quorum-policy=ignore \
>
> stonith-enabled=false \
>
> maintenance-mode=false \
>
> expected-quorum-votes=2 \
>
> last-lrm-refresh=1456434863
>

Im confused I don't see the VG(s) and LV(s) under cluster control have you
done that  bit?
>
>
> crm_mon shows…
>
>
>
> Last updated: Thu Feb 25 13:49:06 2016
>
> Last change: Thu Feb 25 13:49:04 2016
>
> Stack: classic openais (with plugin)
>
> Current DC: ha13b - partition with quorum
>
> Version: 1.1.11-97629de
>
> 2 Nodes configured, 2 expected votes
>
> 8 Resources configured
>
>
>
>
>
> Online: [ ha13a ha13b ]
>
>
>
> Master/Slave Set: ms_drbd0 [p_drbd0]
>
>  Masters: [ ha13a ]
>
>  Slaves: [ ha13b ]
>
> Master/Slave Set: ms_drbd1 [p_drbd1]
>
>  Masters: [ ha13b ]
>
>  Slaves: [ ha13a ]
>
> p_vip_clust17   (ocf::heartbeat:IPaddr2):   Started ha13a
>
> p_vip_clust18   (ocf::heartbeat:IPaddr2):   Started ha13b
>
> p_fs_clust17(ocf::heartbeat:Filesystem):Started ha13a
>
> p_fs_clust18(ocf::heartbeat:Filesystem):Started ha13b
>
>
>
> Failed actions:
>
> p_fs_clust17_start_0 on ha13b 'not installed' (5): call=124,
status=complete, last-rc-change='Thu Feb 25 13:49:04 2016', queued=0ms,
exec=46ms
>
> p_fs_clust18_start_0 on ha13a 'not installed' (5): call=124,
status=complete, last-rc-change='Thu Feb 25 13:49:04 2016', queued=0ms,
exec=47ms
>
>
>
> …however, the filesystems are properly mounted.
>
>
>
> When I try to failover, it fails…
>
>
>
> --
>
> Eric Robinson
>
>
>
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Having Trouble with LVM on DRBD

2016-02-25 Thread Igor Cicimov

On 25/02/2016 7:47 PM, "Eric Robinson"  wrote:
>
> I have a 2-node cluster, where each node is primary for one drbd volume
and secondary for the other node’s drbd volume. Replication is A->B for
drbd0 and A<-B for drbd1. I have a logical volume and filesystem on each
drbd device. When I try to failover resources, the filesystem fails to
mount because lvdisplay shows the logical volume is listed as “not
available” on the target node. Is there some trick to getting LVM on DRBD
to fail over properly?
>
>
>
> --
>
> Eric Robinson
>
>
>
>
>
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
And your pacemaker config is??? Run
#  crm configure show
and paste it here.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] Fwd: DRBD9 drbdadm complains about fencing being in wrong section

2016-02-22 Thread Igor Cicimov

Sorry, forgot to include the list in my reply.

-- Forwarded message --
From: Igor Cicimov 
Date: Mon, Feb 22, 2016 at 9:33 PM
Subject: Re: [DRBD-user] DRBD9 drbdadm complains about fencing being in
wrong section
To: Roland Kammerer 


Hi Roland,

On Thu, Feb 18, 2016 at 8:03 PM, Roland Kammerer  wrote:

> On Thu, Feb 18, 2016 at 10:37:58AM +1100, Igor Cicimov wrote:
> > On Wed, Feb 17, 2016 at 8:18 PM, Roland Kammerer <
> roland.kamme...@linbit.com
> > > wrote:
> >
> > > On Wed, Feb 17, 2016 at 04:20:12PM +1100, Igor Cicimov wrote:
> > > > Hi,
> > > >
> > > > I'm testing 9.0.1.1 installed from git and have a resource with
> fencing
> > > in
> > > > the disk section:
> > > >
> > > > disk {
> > > > on-io-error detach;
> > > > fencing resource-and-stonith;
> > > > }
> > >
> > > It belongs to net{}, and yes, the man page is outdated, I will fix
> that.
> > >
> > >
> >  Thanks Roland that worked. By the way I'm facing another issue when
> > starting the service:
> >
> > # service drbd start
> >  * Starting DRBD
> > resources
> > [
> >  create res: vg1
> >prepare disk: vg1
> > adjust disk: vg1
> > prepare net: vg1
> > adjust peer_devices: vg1
> > attempt to connect: vg1
> > ]
> > ...drbdadm: Unknown command 'sh-b-pri'
>
> Too old version of drbd-utils (which provides the init script)?
>

Nope, latest install from git as it can be seen in the output of the
drbdadm version in my initial post.

I'm also seeing another discrepancy between the utility and what is in the
docs:

# drbdadm status vg1 --verbose --statistics
drbdadm: unrecognized option '--statistics'
try 'drbdadm help'

where '--statistics' is suppose to be a valid option according to the docs.


> 'sh-b-pri' was dropped, see the corresponding comment in the init
> script:
> # Become primary if configured
> # Currently, this is necessary for drbd8
> # drbd9 supports automatic promote and removes the
> # "sh-b-pri" command.
> $DRBDADM sh-b-pri all || true
>
> >
> > although drbdadm tells me that all is fine:
> >
> > # drbdadm status
> > vg1 role:Primary
> >   disk:UpToDate
> >   drbd02 role:Primary
> > peer-disk:UpToDate
>
> Looks good.
>
> >
> > Now this is a problem since I can't set drbd under pacemaker control:
> >
> > Failed actions:
> > p_drbd_vg1_start_0 (node=drbd01, call=9, rc=1, status=Timed Out,
> > last-rc-change=Thu Feb 18 10:07:43 2016
> > , queued=20004ms, exec=1ms
> > ): unknown error
> > p_drbd_vg1_start_0 (node=drbd02, call=9, rc=1, status=Timed Out,
> > last-rc-change=Thu Feb 18 10:07:43 2016
> > , queued=20005ms, exec=0ms
> > ): unknown error
> >
> > I've been running this kind of setups with 8.4.4 with no issues. Has
> > something changed around drbd management in version 9 or is it just
> > something wrong with the initd script?
>
> AFAIK you should not mix the drbd service with pacemaker anyways. Choose
> one, pacemaker. Pacemaker is really not my field of expertise, so
> somebody else has to jump in here.
>
>
I'm not mixing them drbd is set to not autostart and should be under
pacemaker control...eventually.


> > Finally thanks for the drbdmanage pointer I'll do some research about it
> (I
> > think I saw some presentation of yours on the web). I don't see it
> > installed on my system (Ubuntu 14.04.4 LTS) so guess it is not packaged
> > with drbd-utils but is a separate toll. I have a question though, since I
> > have done the config already manually, is the drbdmanage going to take
> over
> > based on already existing configuration or I will need to reconfigure
> > everything all over again?
>
> It is a separate package and not part of drbd-utils. AFAIR downstream
> Debian/Ubuntu did not pick it up yet. For Ubuntu there is a PPA if you
> want to try it:
>
> https://launchpad.net/~linbit/+archive/ubuntu/linbit-drbd9-stack
>
> Currently there is no importing of manually crafted res files. To me, it
> is actually very low on the feature list I care about. Depending on the
> size/time for resync/... it is probably the easiest if you start from
> scratch, but then it is really easy:
>
> node1-n:$ vgcreate drbdpool ...
> node1:$ drbdmanage init
> node1:$ drbdmange add-node node2 IP
> node1:$ drbdmange new-volume foo 10G --deploy 2
>
> http://drbd.linbit.com/users-guide-9.0/ch-admin-drbdmanage.html
>
> And you are done, lvm configured, DRBD on top of it.
>
> Regards, rck
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD9 drbdadm complains about fencing being in wrong section

2016-02-17 Thread Igor Cicimov

On Wed, Feb 17, 2016 at 8:18 PM, Roland Kammerer  wrote:

> On Wed, Feb 17, 2016 at 04:20:12PM +1100, Igor Cicimov wrote:
> > Hi,
> >
> > I'm testing 9.0.1.1 installed from git and have a resource with fencing
> in
> > the disk section:
> >
> > disk {
> > on-io-error detach;
> > fencing resource-and-stonith;
> > }
>
> It belongs to net{}, and yes, the man page is outdated, I will fix that.
>
> Additionally, drbdmanage is nice to generate cluster configs, it parses
> the output of drbdsetup, therefore it is always right, even if the man
> page is outdated ;-) Otherwise the following would not have completed at
> all:
>
> drbdmanage net-options --fencing resource-and-stonith --resource xyz
> drbdmanage list-options xyz
>
>
 Thanks Roland that worked. By the way I'm facing another issue when
starting the service:

# service drbd start
 * Starting DRBD
resources
[
 create res: vg1
   prepare disk: vg1
adjust disk: vg1
prepare net: vg1
adjust peer_devices: vg1
attempt to connect: vg1
]
...drbdadm: Unknown command 'sh-b-pri'

although drbdadm tells me that all is fine:

# drbdadm status
vg1 role:Primary
  disk:UpToDate
  drbd02 role:Primary
peer-disk:UpToDate

Now this is a problem since I can't set drbd under pacemaker control:

Failed actions:
p_drbd_vg1_start_0 (node=drbd01, call=9, rc=1, status=Timed Out,
last-rc-change=Thu Feb 18 10:07:43 2016
, queued=20004ms, exec=1ms
): unknown error
p_drbd_vg1_start_0 (node=drbd02, call=9, rc=1, status=Timed Out,
last-rc-change=Thu Feb 18 10:07:43 2016
, queued=20005ms, exec=0ms
): unknown error

I've been running this kind of setups with 8.4.4 with no issues. Has
something changed around drbd management in version 9 or is it just
something wrong with the initd script?

Finally thanks for the drbdmanage pointer I'll do some research about it (I
think I saw some presentation of yours on the web). I don't see it
installed on my system (Ubuntu 14.04.4 LTS) so guess it is not packaged
with drbd-utils but is a separate toll. I have a question though, since I
have done the config already manually, is the drbdmanage going to take over
based on already existing configuration or I will need to reconfigure
everything all over again?

Thanks,
Igor
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] DRBD9 drbdadm complains about fencing being in wrong section

2016-02-16 Thread Igor Cicimov

Hi,

I'm testing 9.0.1.1 installed from git and have a resource with fencing in
the disk section:

disk {
on-io-error detach;
fencing resource-and-stonith;
}

but when running:

# drbdadm create-md vg1
drbd.d/vg1.res:10: Parse error: 'resync-rate | c-plan-ahead |
c-delay-target | c-fill-target | c-max-rate | c-min-rate' expected,
but got 'fencing'

As far as I can see this is still valid options in the DRBD9 guide. The
utils are latest 8.9.6 also installed from git.

# dpkg -l | grep drbd
ii  drbd-dkms   9.0.1-1
all  RAID 1 over TCP/IP for Linux module source
ii  drbd-utils  8.9.6-1
amd64RAID 1 over TCP/IP for Linux (user utilities)

# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ c6e62702d5e4fb2cf6b3fa27e67cb0d4b399a30b\
build\ by\ @drbd02\,\ 2016-02-17\ 11:58:21
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090001
DRBDADM_VERSION_CODE=0x080906
DRBDADM_VERSION=8.9.6

If this is correct, where does the fencing do then?

Thanks,
Igor
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD spontaneously loses connection

2015-12-10 Thread Igor Cicimov

On 11/12/2015 4:40 PM, "Digimer"  wrote:
>
> On 11/12/15 12:12 AM, Igor Cicimov wrote:
> >
> >
> > On Fri, Dec 11, 2015 at 3:08 AM, Digimer  > <mailto:li...@alteeve.ca>> wrote:
> >
> > On 10/12/15 09:27 AM, Fabrizio Zelaya wrote:
> > > Thank you Lars and Adam for your recommendations.
> > >
> > > I have ping-timeout set to 5 and it still happened.
> > >
> > > Lars with this fencing idea. I have been contemplating this,
however I
> > > am rebuilding servers which are already in place,  All the
configuration
> > > related to cman and drbd  was literally copied and pasted from
the old
> > > servers.
> >
> > You can hook DRBD into cman's fencing with the rhch_fence fence
handler,
> > which is included with DRBD. This assumes, of course, that you have
> > cman's fencing configured properly.
> >
> > > Is this concept of having dual-primary without fencing being a
> > > mis-configuration something new?
> >
> > No, but it is often overlooked.
> >
> > > I ask this because as you would imagine by now, the old servers
are
> > > working with the exact same configuration and have no problems at
all.
> > > And while your idea makes perfect sense it would also make sense
to have
> > > the exact same problem on every version of drbd.
> >
> > Fencing is like a seatbelt. You can drive for years never needing
it,
> > but when you do, it saves you from hitting the windshield.
> >
> > Fencing is 100% needed, and all the more so with dual-primary.
> >
> >
> > So this practically excludes the DRBD usage in the public clouds like
> > AWS, right? Here shutting down the peer's power supply is impossible and
> > using the CLI has no guarantee that shutting down a peer VM will ever
> > happen since has to be done via the network. Even if the VM's have
> > multiple network interfaces provisioned for redundancy in different
> > subnets they are still virtual and there is possibility they end up on
> > the same physical interface on the hypervisor host which has failed (or
> > it's switch), causing the split brain in the first place.
>
> As much as I may personally think that public clouds are not good
> platforms for HA...
>
> No, AWS is possible to use. There is a fence agent called (I believe)
> fence_ec2
Yep that agent exists although it might create some security issues due to
the permissions needed to run the API CLI calls (ssh key, certificate, iam
role, etc) on the VM so it might not be an option for everyone.

that works by requesting an instance be terminated and then
> waiting for the confirmation of that task completing. You would
> configure this in cluster.conf and then hook DRBD into it by using
> 'fence-handler "/path/to/rhcs_fence";' and 'fencing-policy
> "resource-and-stonith";'. Then, if a node is lost, DRBD will block and
> ask cman to fence the node, and wait for a success message.
>
> All the other things you mention are reasons why I personally don't
> consider the cloud a good platform, but it is used. For me, I insist on
> dual fence methods; First using IPMI and, if that fails, falling back to
> a pair of switched PDUs to cut the (redundant) PSUs off.
Yeah no such goodies in ec2 :-)

>
> > > There is a difference to be consider I guess. I am now installing
this
> > > servers using SL6 as you saw in my original email, the old
servers are
> > > working with Debian 6.0.7
> > >
> > > The old servers are running  drbd8-utils  2:8.3.7-2.1
> >
> > --
> > Digimer
> > Papers and Projects: https://alteeve.ca/w/
> > What if the cure for cancer is trapped in the mind of a person
without
> > access to education?
> > ___
> > drbd-user mailing list
> > drbd-user@lists.linbit.com <mailto:drbd-user@lists.linbit.com>
> > http://lists.linbit.com/mailman/listinfo/drbd-user
> >
> >
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD spontaneously loses connection

2015-12-10 Thread Igor Cicimov

On Fri, Dec 11, 2015 at 3:08 AM, Digimer  wrote:

> On 10/12/15 09:27 AM, Fabrizio Zelaya wrote:
> > Thank you Lars and Adam for your recommendations.
> >
> > I have ping-timeout set to 5 and it still happened.
> >
> > Lars with this fencing idea. I have been contemplating this, however I
> > am rebuilding servers which are already in place,  All the configuration
> > related to cman and drbd  was literally copied and pasted from the old
> > servers.
>
> You can hook DRBD into cman's fencing with the rhch_fence fence handler,
> which is included with DRBD. This assumes, of course, that you have
> cman's fencing configured properly.
>
> > Is this concept of having dual-primary without fencing being a
> > mis-configuration something new?
>
> No, but it is often overlooked.
>
> > I ask this because as you would imagine by now, the old servers are
> > working with the exact same configuration and have no problems at all.
> > And while your idea makes perfect sense it would also make sense to have
> > the exact same problem on every version of drbd.
>
> Fencing is like a seatbelt. You can drive for years never needing it,
> but when you do, it saves you from hitting the windshield.
>
> Fencing is 100% needed, and all the more so with dual-primary.
>

So this practically excludes the DRBD usage in the public clouds like AWS,
right? Here shutting down the peer's power supply is impossible and using
the CLI has no guarantee that shutting down a peer VM will ever happen
since has to be done via the network. Even if the VM's have multiple
network interfaces provisioned for redundancy in different subnets they are
still virtual and there is possibility they end up on the same physical
interface on the hypervisor host which has failed (or it's switch), causing
the split brain in the first place.


>
> > There is a difference to be consider I guess. I am now installing this
> > servers using SL6 as you saw in my original email, the old servers are
> > working with Debian 6.0.7
> >
> > The old servers are running  drbd8-utils  2:8.3.7-2.1
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] repeatable, infrequent, loss of data with DRBD

2015-08-20 Thread Igor Cicimov

On 20/08/2015 6:58 PM, "Matthew Vernon"  wrote:
>
> Hi,
>
> > Are you sure LVM only uses the DRBD device to write data to and not the
> > backend disk? We've had this issue in the past and this was caused by
> > LVM which scans all the devices for PV's, VG's and LV's and sometimes
> > pick the wrong device. You can fix this by changing the filter in the
> > lvm.conf file. If you change this, don't forget to remove the LVM cache
> > file first and then to rescan everything.
>
> I'm reasonably sure, yes - I have LVM configured to use drbd devices thus:
>
> preferred_names = [ "^/dev/drbd" ]
>
> Regards,
>
> Matthew
>
And since you use dual primary mode you also use cLVM too and cluster aware
file system like gfs or ocfs2 right?
___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] How to configure drbd stacked on 2 clusters with virtual IPs

2015-06-04 Thread Igor Cicimov

On 04/06/2015 8:50 PM, "Kimly Heler"  wrote:
>
> Hello,
>
> I have 2 of two-node clusters in different data centers. I have
configured drbd of lvm resource on each cluster. Now I need to configure a
stacked drbd on top of these 2 clusters.
>
> It should be similar to section 8.4.2 in
http://drbd.linbit.com/users-guide-emb/s-pacemaker-stacked-resources.html
except
I need to find a way to assign floating IPs to the stacked drbd.
>
> Here is what I derived from the user guide:
>
> resource clusterA {
>   net {
> protocol C;
>   }
>
>   on host1-datacenterA {
> device /dev/drbd0;
> disk   /dev/sda6;
> address10.0.0.1:7788;
> meta-disk internal;
>   }
>
>   on host2-datacenterA {
> device/dev/drbd0;
> disk  /dev/sda6;
> address   10.0.0.2:7788;
> meta-disk internal;
>   }
> }
>
> resource clusterB {
>   net {
> protocol C;
>   }
>
>   on host1-datacenterB {
> device /dev/drbd0;
> disk   /dev/sda6;
> address10.0.0.3:7788;
> meta-disk internal;
>   }
>
>   on host2-datacenterB {
> device/dev/drbd0;
> disk  /dev/sda6;
> address   10.0.0.3:7788;
> meta-disk internal;
>   }
> }
>
> resource r0 {
>   net {
> protocol A;
>   }
>
>   stacked-on-top-of clusterA {
> device /dev/drbd10;
> address192.168.42.1:7788; # Virtual IP of cluster A
>   }
>
>   stacked-on-top-of clusterB {
> device /dev/drbd10;
> address192.168.42.2:7788; # Virtual IP of cluster B
>   }
> }
>
> With this, it would error out with the virtual IP address with "IP not
found on host".
>
> I also looked at Floating peers in the user guide, but floating
192.168.42.1:7789 would replace the whole section stacked-on-top-of.
>
> How can I tell drbd that r0 is a stacked drbd resource, and the peers IPs
are floating?
>

Did you try setting:

net.ipv4.ip_nonlocal_bind=1

so apps can bind to non local ip?
> Thanks in advance
>
> Kimly
>
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD in LXC

2014-06-02 Thread Igor Cicimov

On 03/06/2014 7:50 AM, "Lars Ellenberg"  wrote:
>
> On Mon, Jun 02, 2014 at 11:07:56AM +1000, Igor Cicimov wrote:
> > On 02/06/2014 6:14 AM, "Lars Ellenberg" 
wrote:
> > >
> > > On Sat, May 31, 2014 at 10:48:09PM +1000, Igor Cicimov wrote:
> > > > Hi all,
> > > >
> > > > I've been searching for a solution about this but couldn't find
anything
> > > > but couple of threads ending without any outcome.
> > > >
> > > > I have drbd-8.4.4 installed in Ubuntu-12.04 container running on
> > > > Ubuntu-12.04 host. I can only create the metadata and then the try
to
> > bring
> > > > the resource up fails with the following message:
> > > >
> > > > root@lxc01:~# drbdadm up r0
> > > > Could not connect to 'drbd' generic netlink family
> > > >
> > > > The kernel module loads fine and /proc/drbd directory exists. For
sure
> > this
> > > > is not related to apparmor since I get the same issue with the
service
> > > > stopped on the host and rules teardown.
> > > >
> > > > Has there been any solution for this or DRBD can't run inside linux
> > > > container at all?
> > >
> > > Possibly could be made to work.
> > > Or not.
> > > I don't know.
> > >
> > > There is a lot of things that would need to be abstracted
> > > into per container namespaces. I'm not even sure if all
> > > the infrastructure to abstract device numbers is there yet,
> > > even in recent kernels.
> > >
> > > *IF* it can be made to work,
> > > properly isolating different containers against each other would
> > > require ... let's say "code changes" ... in the module
> > > (there is only one kernel, so just one instance of this kernel module,
> > >  there are many containers,
> > >  it would likely need to become "container aware" on various levels).
> > >
> > > Appart from that: I don't see the use case.
> > > Having DRBD on the host, outside the containers,
> > > and replicate the containers state and content
> > > is a known working use case.
> > >
> > > But what for would you want DRBD *inside* the container?
> > >
> >
> > Why not? Its working inside vm's why not inside containers too? They are
> > becoming more popular as light virtualization solution so for sure this
> > will become necessity down the road. Are you saying this should not be
> > available on user level in case lets say someone wants to try a cluster
of
> > two containers with drbd replicated device?
>
> Each VM has its own "kernel", its own "drbd" module.
> => works.
> But even there, usually you put the VM image on top of DRBD,
> not some DRBD inside the VM (even though that is possible).
>
> Container: does not work.
> Possibly could be made to work.
> Probably would not only require changes in DRBD module code,
> but in generic kernel as well. I don't know.
> I do not see that happening anytime soon.
>
> Maybe find someone who knows,
> and get them to explain to me
> what would be necessary to do in DRBD
> to make it play nice from within containers...
>
> But I really don't see why.
>
> If you create a container,
> you tell it what "storage" to use.
> Put that on top of DRBD.
> Done.
>
Thanks Lars, sounds like a lots of work. The only difference is that the
control over drbd is outside the container it self making the drbd
invisible for the inside users. But it is what it is so we have to skip
drbd from the cluster stack on container level.
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD in LXC

2014-06-01 Thread Igor Cicimov

On 02/06/2014 6:14 AM, "Lars Ellenberg"  wrote:
>
> On Sat, May 31, 2014 at 10:48:09PM +1000, Igor Cicimov wrote:
> > Hi all,
> >
> > I've been searching for a solution about this but couldn't find anything
> > but couple of threads ending without any outcome.
> >
> > I have drbd-8.4.4 installed in Ubuntu-12.04 container running on
> > Ubuntu-12.04 host. I can only create the metadata and then the try to
bring
> > the resource up fails with the following message:
> >
> > root@lxc01:~# drbdadm up r0
> > Could not connect to 'drbd' generic netlink family
> >
> > The kernel module loads fine and /proc/drbd directory exists. For sure
this
> > is not related to apparmor since I get the same issue with the service
> > stopped on the host and rules teardown.
> >
> > Has there been any solution for this or DRBD can't run inside linux
> > container at all?
>
> Possibly could be made to work.
> Or not.
> I don't know.
>
> There is a lot of things that would need to be abstracted
> into per container namespaces. I'm not even sure if all
> the infrastructure to abstract device numbers is there yet,
> even in recent kernels.
>
> *IF* it can be made to work,
> properly isolating different containers against each other would
> require ... let's say "code changes" ... in the module
> (there is only one kernel, so just one instance of this kernel module,
>  there are many containers,
>  it would likely need to become "container aware" on various levels).
>
> Appart from that: I don't see the use case.
> Having DRBD on the host, outside the containers,
> and replicate the containers state and content
> is a known working use case.
>
> But what for would you want DRBD *inside* the container?
>
Why not? Its working inside vm's why not inside containers too? They are
becoming more popular as light virtualization solution so for sure this
will become necessity down the road. Are you saying this should not be
available on user level in case lets say someone wants to try a cluster of
two containers with drbd replicated device?
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] DRBD in LXC

2014-06-01 Thread Igor Cicimov

Hi all,

I've been searching for a solution about this but couldn't find anything
but couple of threads ending without any outcome.

I have drbd-8.4.4 installed in Ubuntu-12.04 container running on
Ubuntu-12.04 host. I can only create the metadata and then the try to bring
the resource up fails with the following message:

root@lxc01:~# drbdadm up r0
Could not connect to 'drbd' generic netlink family

The kernel module loads fine and /proc/drbd directory exists. For sure this
is not related to apparmor since I get the same issue with the service
stopped on the host and rules teardown.

Has there been any solution for this or DRBD can't run inside linux
container at all?

Thanks,
Igor
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRDB on one host

2014-05-14 Thread Igor Cicimov

Cant you just use ebs optimized instance?

On 14/05/2014 7:47 AM, "Csanad Novak"  wrote:
>
> Hi Arnold,
>
> I guess the challenge here is that the instance-storage get reseted
completely (back to the point when its just a raw device without
partitions) on every stop/start cycle.
>
> Currently I’m using a less than satisfactory solution on non critical
servers, as follow:
>
> I’ve got mounted my instance-storage as /dev/sdb1 to /var
> I’ve got mounted my ebs device as /dev/sdc1 to /mnt/var-backup
>
> I’ve lsyncd configured to sync from /var to /mnt/var-backup
>
> I’ve wrote a little bash script for auto partitioning and mounting. When
the instance get stopped for any reason the instance storage gets reseted
and the whole /var directory destroyed. After boot up my script
repartitioning the instance storage, mount it, and then automatically rsync
back the data from /mnt/var-backup.
>
> There is several issue with this architecture, but the biggest one with
lsyncd, which wasn’t designed for this. If I cloud change lsyncd to drbd,
that would great.
>
> I must stress again: instance-storage is a local hard-disk based block
device, while the ebs storage is a network attached block device.
>
> Cheers:
>
> Csanad
>
>
> From: Arnold Krille arn...@arnoldarts.de
> Reply: Arnold Krille arn...@arnoldarts.de
> Date: 14 May 2014 at 6:18:25
> To: drbd-user@lists.linbit.com drbd-user@lists.linbit.com
> Subject:  Re: [DRBD-user] DRDB on one host
>
>> On Tue, 13 May 2014 09:57:21 +1200 Csanad Novak
>>  wrote:
>> > Recently on the latest AWS Summit I’ve seen an interesting setup for
>> > utilising the AWS EC2 instance storage. The instance storage is the
>> > local filesystem on the Amazon cloud, unfortunately it is not
>> > persistent - all data get lost if the instance stops. For persistent
>> > storage one must use an so called EBS volume, which is a network
>> > attached block device. Unfortunately the EBS storage performance is
>> > varies, depending on the neighbour instances’ traffic. An ideal world
>> > one can use the instance storage without the worry of data loss.
>> >
>> > On the latest AWS Summit a presenter proposed an architecture where
>> > the instance storage was utilised and was synced with DRBD to the EBS
>> > block
>> > device.
http://www.slideshare.net/AmazonWebServices/so-you-think-you-are-an-aws-ninja-dean-samuels
-

>> > slide 11.
>> >
>> > I wonder how an example config would looks like. I went trough
>> > quickly on the tutorials, but couldn’t find any example with one host
>> > setup.
>>
>> drbd on one host. Isn't that what raid/md is?
>>
>> I haven't had a chance to play with aws but if both ebs volumes and the
>> instance-storage are block devices, it should be easy to mirror them
>> with md or lvm.
>>
>> - Arnold
>> 
>> ___
>> drbd-user mailing list
>> drbd-user@lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>
> --
> Kind regards,
>
> Csanad Novak
> Technical Lead
> P / 09 353 1234
> M / 021 060 1531
> E / csa...@pixelfusion.co.nz
> W / PIXELFUSION.CO.NZ / WHERE DESIGN & DEVELOPMENT FUSE
> A / SUITE 403 "FORBIDDEN", THE IRON BANK, 150 KARANGAHAPE RD, AUCKLAND /
PIXEL FUSION LTD, PO BOX 68 564, NEWTON, AUCKLAND / MAP
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD startup error - r0: UnknownMandatoryTag

2012-01-02 Thread Igor Cicimov

Hi Felix,

Thaks for the info and sorry for the late response I had a week off for the
holidays. When I try to manually establish the connection between the
nodes, not via heartbeat, I get the following state:

# modprobe drbd
[root@cpvmdc1 conf.d]# drbdadm up r0
[root@cpvmdc1 conf.d]# drbdadm primary r0
[root@cpvmdc1 conf.d]# cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@cpvmdc1,
2011-12-21 17:13:05
 0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C rs
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:200736


Now, I can see the node listening on port 7788:

[root@cpvmdc1 private]# netstat -tuplen | grep 7788
tcp0  0 192.168.1.100:7788  0.0.0.0:*
LISTEN  0  290311 -


On the secondary:

[root@cpvmdc2 ~]# modprobe drbd
[root@cpvmdc2 ~]# drbdadm up r0
[root@cpvmdc2 ~]# cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@cpvmdc1,
2011-12-21 17:13:05
 0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   rs
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:200736
[root@cpvmdc2 ~]# drbdadm connect r0
[root@cpvmdc2 ~]# cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@cpvmdc1,
2011-12-21 17:13:05
 0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   rs
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:200736

but after the "connect" command on the secondary node, the node stays in
StandAlone and I can't see port 7788 open on it. Also the connection state
of the primary changes from WFConnection to StandAlone:

[root@cpvmdc1 conf.d]# cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@cpvmdc1,
2011-12-21 17:13:05
 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   rs
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:200736

and it gives up trying to connect.


My drbd resource config file looks like this now:

[root@cpvmdc1 conf.d]# cat /etc/drbd.d/r0.res
resource r0 {
meta-disk internal;
device minor 0;
net {
protocol C;
}
disk { on-io-error detach; }
syncer {
rate 100M;
verify-alg md5;
}
on cpvmdc1 {
device /dev/drbd0;
address 192.168.1.100:7788;
disk /dev/sdc1;
meta-disk  internal;
}
on cpvmdc2 {
device /dev/drbd0;
address 192.168.1.101:7788;
disk /dev/sdc1;
meta-disk  internal;
}
}


When I run the "syncer" option for drbdadm I get:

[root@cpvmdc1 conf.d]# drbdadm syncer r0
drbdadm: Unknown command 'syncer'

Am I still missing somethinfg?

Thanks again for your help

Igor

On Fri, Dec 23, 2011 at 7:27 PM, Felix Frank  wrote:

> Hi,
>
> On 12/23/2011 03:37 AM, Igor Cicimov wrote:
> > Thanks to both of you guys for your help. Now the error is gone but they
> > are still in StandAlone mode instead of connected:
>
> did you start DRBD by initscript? Looks as though you used "drbdadm
> attach" on each node. Don't do that.
> You should use "drbdadm up" to activate a resource.
>
> Now that the disks are already attached, you can just do
> "drbdadm syncer" then "drbdadm connect", on each node respectively.
>
> > It's also strange
> > that I'm sure I formated drbd partition with ext3 but is showing as ext2
> > in mount:
>
> Ext3 is downwards compatible. You can mount ext3 partitions as ext2, and
> your kernel will refrain from using the ext3 journal.
>
> Is there an fstab entry for drbd0? Does it specify ext2 as filesystem?
> Or did you mount that manually?
>
> Either way, just be sure that mount knows the FS is supposed to have a
> journal, and this should Just Work.
>
> HTH,
> Felix
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD startup error - r0: UnknownMandatoryTag

2011-12-22 Thread Igor Cicimov

Thanks to both of you guys for your help. Now the error is gone but they
are still in StandAlone mode instead of connected:


[root@cpvmdc1 httpd]# service drbd status

drbd driver loaded OK; device status:

version: 8.4.0 (api:1/proto:86-100)

GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@cpvmdc1,
2011-12-21 17:13:05
m:res  cs  ro   ds p   mounted
fstype

0:r0   StandAlone  Primary/Unknown  UpToDate/DUnknown  rs  ext2

[root@cpvmdc2 ~]# service drbd status
drbd driver loaded OK; device status:

version: 8.4.0 (api:1/proto:86-100)

GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@cpvmdc1,
2011-12-21 17:13:05
m:res  cs  ro ds p   mounted
fstype
0:r0   StandAlone  Secondary/Unknown  UpToDate/DUnknown  rs


Is there anything else I'm missing? It's also strange that I'm sure I
formated drbd partition with ext3 but is showing as ext2 in mount:

[root@cpvmdc1 httpd]# mount
/dev/drbd0 on /drbd0 type ext2 (rw)
.
.



Thanks,
Igor

On Thu, Dec 22, 2011 at 8:42 PM, Rasto Levrinc wrote:

> On Thu, Dec 22, 2011 at 6:25 AM, Igor Cicimov  wrote:
>
> > version: 8.4.0 (api:1/proto:86-100)
>
>
> Also you can (should) upgrade to 8.4.1, if you need 8.4, since it has
> been fixed:
>
>
> http://git.drbd.org/gitweb.cgi?p=drbd-8.4.git;a=commitdiff;h=63a8a23bef7f421775b9b79ba72baf60f97005a7;hp=251bd906213bb7c4a4abf91076fd0bf6116295ab
>
> Rasto
>
> --
> Dipl.-Ing. Rastislav Levrinc
> rasto.levr...@gmail.com
> Linux Cluster Management Console
> http://lcmc.sf.net/
>  ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] DRBD startup error - r0: UnknownMandatoryTag

2011-12-21 Thread Igor Cicimov

Hi all,

I'm trying to set up HA with DRBD and Heartbeat on two identical RedHat
VM's hosted on Windows VirtualBox. The details about the server(s) drbd
biuld and config are given below:

[root@cpvmdc1 ~]# uname -a
Linux cpvmdc1 2.6.18-229.el5 #1 SMP Tue Oct 26 18:54:44 EDT 2010 x86_64
x86_64 x86_64 GNU/Linux

[root@cpvmdc1 ~]# cat /etc/drbd.d/r0.res
resource r0 {
meta-disk internal;
device minor 0;
on cpvmdc1 {
address 192.168.1.100:7788;
disk /dev/sdc1;
}
on cpvmdc2 {
address 192.168.1.101:7788;
disk /dev/sdc1;
}
}

[root@cpvmdc1 ~]# ls -l /dev/drbd0
brw-r- 1 root disk 147, 0 Dec 22 15:57 /dev/drbd0

[root@cpvmdc1 ~]# fdisk -l /dev/sdc
Disk /dev/sdc: 209 MB, 209715200 bytes
255 heads, 63 sectors/track, 25 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot  Start End  Blocks   Id  System
/dev/sdc1   1  25  200781   83  Linux
[root@cpvmdc1 ~]# ls -l /usr/src/redhat/RPMS/x86_64/
total 2340
-rw-r--r-- 1 root root   24730 Dec 21 17:08 drbd-8.4.0-1.x86_64.rpm
-rw-r--r-- 1 root root5822 Dec 21 17:08
drbd-bash-completion-8.4.0-1.x86_64.rpm
-rw-r--r-- 1 root root  661826 Dec 21 17:08
drbd-debuginfo-8.4.0-1.x86_64.rpm
-rw-r--r-- 1 root root7696 Dec 21 17:08
drbd-heartbeat-8.4.0-1.x86_64.rpm
-rw-r--r-- 1 root root 1287825 Dec 21 17:13
drbd-km-2.6.18_229.el5-8.4.0-1.x86_64.rpm
-rw-r--r-- 1 root root3463 Dec 21 17:13
drbd-km-debuginfo-8.4.0-1.x86_64.rpm
-rw-r--r-- 1 root root   21529 Dec 21 17:08
drbd-pacemaker-8.4.0-1.x86_64.rpm
-rw-r--r-- 1 root root4518 Dec 21 17:08 drbd-udev-8.4.0-1.x86_64.rpm
-rw-r--r-- 1 root root  336012 Dec 21 17:08 drbd-utils-8.4.0-1.x86_64.rpm
-rw-r--r-- 1 root root7101 Dec 21 17:08 drbd-xen-8.4.0-1.x86_64.rpm

[root@cpvmdc1 ~]# ifconfig eth1
eth1  Link encap:Ethernet  HWaddr 08:00:27:AF:8A:CE
  inet addr:192.168.1.100  Bcast:192.168.1.255  Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:4490 errors:0 dropped:0 overruns:0 frame:0
  TX packets:4465 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:1092709 (1.0 MiB)  TX bytes:1081716 (1.0 MiB)

[root@cpvmdc1 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@cpvmdc1,
2011-12-21 17:13:05
m:res  cs  ro   ds p   mounted
fstype
0:r0   StandAlone  Primary/Unknown  UpToDate/DUnknown  rs


All is fine except the following error/problem that I get when starting the
drbd service:

On the want be "primary":

[root@cpvmdc1 ~]# service drbd start
Starting DRBD resources: [
 create res: r0
   prepare disk: r0
adjust disk: r0
 adjust net: r0:failed(connect:10)
]
.
[root@cpvmdc1 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@cpvmdc1,
2011-12-21 17:13:05
m:res  cs  ro ds p   mounted
fstype
0:r0   StandAlone  Secondary/Unknown  UpToDate/DUnknown  rs


On the "secondary":

[root@cpvmdc2 ~]# service drbd start
Starting DRBD resources: [
 create res: r0
   prepare disk: r0
adjust disk: r0
 adjust net: r0:failed(connect:10)
]
.
[root@cpvmdc2 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@cpvmdc1,
2011-12-21 17:13:05
m:res  cs  ro ds p   mounted
fstype
0:r0   StandAlone  Secondary/Unknown  UpToDate/DUnknown  rs


For some reason they fail to connect. I have no firewall between/on the
servers and they can ping each other. Trying to connect manually gives me
the following error:

[root@cpvmdc1 ~]# drbdsetup connect r0 ipv4:192.168.1.100:7788 ipv4:
192.168.1.101:7788
r0: Failure: (126) UnknownMandatoryTag
additional info from kernel:

Any idea what this "UnknownMandatoryTag" means?

Thanks in advance for any help or advise.
Igor
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

98 matches

Mail list logo