Re: [DRBD-user] Configuring a two-node cluster with redundant nics on each node?

2018-10-17 Thread Adi Pircalabu

On 2018-10-18 04:07, Bryan K. Walton wrote:

Hi,

I'm trying to configure a two-node cluster, where each node has
dedicated redundant nics:

storage node 1 has two private IPs:
10.40.1.3
10.40.2.2

storage node 2 has two private IPs:
10.40.1.2
10.40.2.3

I'd like to configure the resource so that the nodes have two possible
paths to the other node.  I've tried this:

resource r0 {
on storage1 {
device/dev/drbd1;
disk  /dev/mapper/centos_storage1-storage;
address   10.40.2.2:7789;
address   10.40.1.3:7789;
meta-disk internal;
}
on storage2 {
device/dev/drbd1;
disk /dev/mapper/centos_storage2-storage;
address   10.40.1.2:7789;
address   10.40.2.3:7789;
meta-disk internal;
}
}

But this doesn't work.  When I try to create the device metadata, I get
the following error:

drbd.d/r0.res:6: conflicting use of address statement
'r0:storage1:address' ...
drbd.d/r0.res:5: address statement 'r0:storage1:address' first used
here.

Clearly, my configuration won't work.  Is there a way to accomplish 
what

I'd like to accomplish?


Why aren't you using Ethernet bonding?

--
Adi Pircalabu
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Configuring a two-node cluster with redundant nics on each node?

2018-10-17 Thread Bryan K. Walton
Hi,

I'm trying to configure a two-node cluster, where each node has
dedicated redundant nics:

storage node 1 has two private IPs:
10.40.1.3
10.40.2.2

storage node 2 has two private IPs:
10.40.1.2
10.40.2.3

I'd like to configure the resource so that the nodes have two possible
paths to the other node.  I've tried this:

resource r0 {
on storage1 {
device/dev/drbd1;
disk  /dev/mapper/centos_storage1-storage;
address   10.40.2.2:7789;
address   10.40.1.3:7789;
meta-disk internal;
}
on storage2 {
device/dev/drbd1;
disk /dev/mapper/centos_storage2-storage;
address   10.40.1.2:7789;
address   10.40.2.3:7789;
meta-disk internal;
}
}

But this doesn't work.  When I try to create the device metadata, I get
the following error:

drbd.d/r0.res:6: conflicting use of address statement
'r0:storage1:address' ...
drbd.d/r0.res:5: address statement 'r0:storage1:address' first used
here.

Clearly, my configuration won't work.  Is there a way to accomplish what
I'd like to accomplish?

Thanks,
Bryan Walton
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdadm down failed (-12) - blocked by drbd_submit

2018-10-17 Thread Radoslaw Garbacz
It turned out that the NFS daemon was blocking DRBD.
Thanks, the comment about the 'drbd' kernel processes was helpful.

BTW, the documentation (man pages) for DRBD 9.0 is still from 8.4 and some
options are no longer there.


On Thu, Oct 11, 2018 at 9:48 AM, Radoslaw Garbacz <
radoslaw.garb...@xtremedatainc.com> wrote:

> Thanks, will take a closer look at this.
>
> On Thu, Oct 11, 2018 at 3:47 AM, Lars Ellenberg  > wrote:
>
>> On Tue, Oct 02, 2018 at 12:56:38PM -0500, Radoslaw Garbacz wrote:
>> > Hi,
>> >
>> >
>> > I have a problem, which (from what I found) has been discussed, however
>> not
>> > in the particular case, which I experienced, so I would be grateful for
>> any
>> > suggestions of how to deal with it.
>> >
>> >
>> > I.
>> > 1. I get an error when demoting DRBD resource:
>> > * drbdadm down data0
>> >
>> > data0: State change failed: (-12) Device is held open by someone
>> > additional info from kernel:
>> > failed to demote
>> > Command 'drbdsetup-84 down data0' terminated with exit code 11
>> >
>> > 2. The device is not mounted and not used by any LVM, so based on some
>> > online discussions I checked the blocking process and it is
>> "drbd0_submit"
>> >
>> > * lsof | grep drbd0
>> > drbd0_sub 16687 root  cwd   DIR  202,1 251 64 /
>>
>> No, it is not.
>>
>> drbd*submitter (only 16 bytes of that name actually make it into the
>> comm part of the task struct, which is what ps or lsof or the like can
>> display) are kernel threads, and part of DRBD operations.
>> They are certainly NOT "holding it open".
>> They are a required part of its existence.
>>
>> "Holding it open" when you think you already unmounted it
>> is typically either some forgotten device mapper thingy
>> (semi-automatically created by kpartx e.g.),
>> or some racy "udev triggered probe".
>>
>> In the latter case, if you retry after a couple seconds,
>> demoting should work.
>>
>> > Is there a good way to deal with this case, as whether some DRBD step is
>> > missing, which leaves the process or killing the process is the right
>> way?
>>
>> Again, that "process" has nothing to do with drbd being "held open",
>> but is a kernel thread that is part of the existence of that DRBD volume.
>>
>> --
>> : Lars Ellenberg
>> : LINBIT | Keeping the Digital World Running
>> : DRBD -- Heartbeat -- Corosync -- Pacemaker
>>
>> DRBD® and LINBIT® are registered trademarks of LINBIT
>> __
>> please don't Cc me, but send to list -- I'm subscribed
>> ___
>> drbd-user mailing list
>> drbd-user@lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>
>
>
> --
> Best Regards,
>
> Radoslaw Garbacz
> XtremeData Incorporated
>



-- 
Best Regards,

Radoslaw Garbacz
XtremeData Incorporated
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] split brain on both nodes

2018-10-17 Thread Digimer
On 2018-10-17 5:35 a.m., Adam Weremczuk wrote:
> Hi all,
> 
> Yesterday I rebooted both nodes a couple of times (replacing BBU RAID
> batteries) and ended up with:
> 
> drbd0: Split-Brain detected but unresolved, dropping connection!

Fencing prevents this.

> on both.
> 
> node1: /drbd-overview//
> //0:r0/0  StandAlone Primary/Unknown UpToDate/DUnknown /srv/test1 ext4
> 3.6T 75G 3.4T 3% /
> 
> node2: /drbd-overview //
> //0:r0/0  StandAlone Secondary/Unknown UpToDate/DUnknown/
> 
> I understand there is a good chance (but not absolute guarantee) that
> node1 holds consistent and up to date data.
> 
> Q1:
> 
> Is it reasonably possible to mount /dev/drbd0 (/dev/sdb1) on node2 in
> read only mode?
> 
> I would like to examine the data before discarding and syncing
> everything from node1.

Yes. You can also promote node 2 to examine it as well.

> /drbdadm disconnect all//
> //drbdadm -- --discard-my-data connect all/

Discarding the data will trigger a resync and resolve the split-brain,
but of course, any changes on the discarded node will be lost.

> Q2:
> 
> Will the above completely purge all data on node2 or just drbd metadata?
> 
> I.e. will all 75G have to be fully copied block by block or a lot less?

It will do a full resync.

> I'm concerned about time and impact on performance when it comes to
> terabytes of data.
> 
> Regards,
> Adam

The resync (on 8.4) adapts the resync rate to minimize impact on
applications using the storage. As it slows itself down to "stay out of
the way", the resync time increases of course. You won't have redundancy
until the resync completes.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] slow sync speed

2018-10-17 Thread Yannis Milios
Just a quick note ..

You are correct, it shouldn't be required (v8.9.10) and I was surprised
> with that too.
>

In the DRBD documentation, it is stated that ...

"When multiple DRBD resources share a single replication/synchronization
network, synchronization with a fixed rate may not be an optimal approach.
So, in DRBD 8.4.0 the variable-rate synchronization was enabled by default."

..and..

"In a few, very restricted situations[4], it might make sense to just use
some fixed synchronization rate. In this case, first of all you need to
turn the dynamic sync rate controller off, by using c-plan-ahead 0;."

..by observing your configuration, it looks like you added that option
since the first time, hence no surprises here, you explicitly decided to
disable variable sync rate ... :)
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] split brain on both nodes

2018-10-17 Thread Adam Weremczuk

Hi all,

Yesterday I rebooted both nodes a couple of times (replacing BBU RAID 
batteries) and ended up with:


drbd0: Split-Brain detected but unresolved, dropping connection!

on both.

node1: /drbd-overview//
//0:r0/0  StandAlone Primary/Unknown UpToDate/DUnknown /srv/test1 ext4 
3.6T 75G 3.4T 3% /


node2: /drbd-overview //
//0:r0/0  StandAlone Secondary/Unknown UpToDate/DUnknown/

I understand there is a good chance (but not absolute guarantee) that 
node1 holds consistent and up to date data.


Q1:

Is it reasonably possible to mount /dev/drbd0 (/dev/sdb1) on node2 in 
read only mode?


I would like to examine the data before discarding and syncing 
everything from node1.


/drbdadm disconnect all//
//drbdadm -- --discard-my-data connect all/

Q2:

Will the above completely purge all data on node2 or just drbd metadata?

I.e. will all 75G have to be fully copied block by block or a lot less?

I'm concerned about time and impact on performance when it comes to 
terabytes of data.


Regards,
Adam

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] slow sync speed

2018-10-17 Thread Adam Weremczuk
You are correct, it shouldn't be required (v8.9.10) and I was surprised 
with that too.
Another evidence of the option being honored is  "want: 150,000 k/sec" 
which I sometimes (not always) see in /proc/drbd



On 17/10/18 10:17, Oleksiy Evin wrote:
If I'm not wrong, the "syncer" section has been deprecated somewhere 
around 8.4.0 drbd version. Based on the logs you provided the version 
you use is 8.4.10, so I don't think that should have any speed impact. 
But I'm glad you've got it resolved.


//OE


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] slow sync speed

2018-10-17 Thread Oleksiy Evin
If I'm not wrong, the "syncer" section has been deprecated somewhere around 
8.4.0 drbd version.  Based on the logs you provided the version you use is 
8.4.10, so I don't think that should have any speed impact. But I'm glad you've 
got it resolved.  

//OE

-Original Message-
From: Adam Weremczuk 
To: Robert Altnoeder 
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] slow sync speed
Date: Wed, 17 Oct 2018 10:05:10 +0100

"Max-buffers 8k" appear to be the sweet spot for me.I'm now getting 145-150 
MB/s transfer rates between nodes which I'm happy with.The biggest problem was 
I didn't have "syncer" section defined at all.
Currently my fully working and behaving config looks like below:
global { usage-count no; }common { protocol C; }resource r0 {   disk { 
on-io-error detach; no-disk-flushes; no-disk-barrier; c-plan-ahead 
0;   }   net { max-buffers 8k;   }   syncer { rate 150M; al-extents 
6400;   }   on lion { device /dev/drbd0; disk /dev/sdb1; address 
192.168.200.1:7788; meta-disk internal;   }   on tiger { device 
/dev/drbd0; disk /dev/sdb1; address 192.168.200.2:7788; meta-disk 
internal;   }}
On 11/10/18 15:06, Robert Altnoeder wrote:On 10/11/2018 03:56 PM, Oleksiy Evin 
wrote:Try to remove the following:
c-fill-target 24M;c-min-rate 80M;c-max-rate 720M;
sndbuf-size 1024k;rcvbuf-size 2048k;
Then gradually increase max-buffers from 4K to 12K checking its impactto the 
sync speed. Make sure you have the same config on both nodesand apply the 
changes with "drbdadm adjust all" on both nodes too.


___drbd-user mailing 
listdrbd-user@lists.linbit.comhttp://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] slow sync speed

2018-10-17 Thread Adam Weremczuk

"Max-buffers 8k" appear to be the sweet spot for me.
I'm now getting 145-150 MB/s transfer rates between nodes which I'm 
happy with.

The biggest problem was I didn't have "syncer" section defined at all.

Currently my fully working and behaving config looks like below:

global { usage-count no; }
common { protocol C; }
resource r0 {
  disk {
    on-io-error detach;
    no-disk-flushes;
    no-disk-barrier;
    c-plan-ahead 0;
  }
  net {
    max-buffers 8k;
  }
  syncer {
    rate 150M;
    al-extents 6400;
  }
  on lion {
    device /dev/drbd0;
    disk /dev/sdb1;
    address 192.168.200.1:7788;
    meta-disk internal;
  }
  on tiger {
    device /dev/drbd0;
    disk /dev/sdb1;
    address 192.168.200.2:7788;
    meta-disk internal;
  }
}

On 11/10/18 15:06, Robert Altnoeder wrote:

On 10/11/2018 03:56 PM, Oleksiy Evin wrote:

Try to remove the following:

c-fill-target 24M;
c-min-rate 80M;
c-max-rate 720M;

sndbuf-size 1024k;
rcvbuf-size 2048k;

Then gradually increase max-buffers from 4K to 12K checking its impact
to the sync speed. Make sure you have the same config on both nodes
and apply the changes with "drbdadm adjust all" on both nodes too.




___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] drbd-9.0.16rc1

2018-10-17 Thread Philipp Reisner
Hi,

the list is shorter than with the last releases. I think this is good news.

What really made us to release now, is fixing the regression introduced with
9.0.15. It was probably not triggered by many parties, because you can only
trigger it if you have requests in flight in excactly the moment a
timer comes by to check if the network timeout expired.

The distributed connect loop was never seen in the wild, maybe only 
our test suite ever reproduced it.

The fixes to the quorum code ensure that recovery works as expected
after a primary node lost quorum.

Please help testing! -- We will release in one week if nobody comes up
with "interesting" behavior. We will use the time to write more
test cases for our test suite.

9.0.16-0rc1 (api:genl2/proto:86-114/transport:14)

 * Fix regression (introduced with 9.0.15) in handling request timeouts;
   all pending requests always considered as overdue when the timer function
   was executed; this led to false positives in detecting timeouts
 * Fix a possible distributed loop when establishing a connection
 * Fix a corner case in case a resync "overtakes" an other one
 * Fix clearing of the PRIMARY_LOST_QUORUM flag
 * Check peers (to ensure quorum is not lost) before generating new current
   UUID after loosing a node
 * In case the locally configured address of a connection is not
   available keep on retrying until it comes back

http://www.linbit.com/downloads/drbd/9.0/drbd-9.0.16-0rc1.tar.gz
https://github.com/LINBIT/drbd-9.0/releases/tag/drbd-9.0.16-0rc1

best regards,
 Phil
-- 
LINBIT | Keeping The Digital World Running

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.



___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user