Re: [DRBD-user] Linstor/DRBD Documentation Gap: Using Multiple Network Paths

2022-06-09 Thread Eric Robinson
Thank you, I will check it out ASAP!

-Eric


From: drbd-user-boun...@lists.linbit.com  
On Behalf Of Michael Troutman
Sent: Thursday, June 2, 2022 10:16 AM
To: drbd-user email list 
Subject: Re: [DRBD-user] Linstor/DRBD Documentation Gap: Using Multiple Network 
Paths

Hello Eric,

Thanks for mentioning the gap in the LINSTOR documentation here. The `PrefNic` 
property that you used will not create path entries in a DRBD resource 
configuration file. Instead, use the `linstor node interface` and `linstor 
resource-connection path` commands.

We added sub-section 2.7.1, Creating Multiple DRBD Paths with LINSTOR, to the 
LINSTOR User's Guide with more details and example commands.

Available here: 
https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-creating-multiple-drbd-paths

Hope this helps you.

Michael.

On 5/25/22 11:50, 
drbd-user-requ...@lists.linbit.com 
wrote:

We tried to follow the Linstor documentation for setting up multiple network 
paths, but there seems to be a step missing.

[...]

What did we miss? There seems to be a step missing in the documentation where 
it tells you how to make DRBD use both NICs and not just the preferred one.

-Eric

--
Michael Troutman | LINBIT Documentation Specialist
America/Detroit (EST, -0500)
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Linstor/DRBD Documentation Gap: Using Multiple Network Paths

2022-05-25 Thread Eric Robinson
We tried to follow the Linstor documentation for setting up multiple network 
paths, but there seems to be a step missing.

First, we created 2 NICs on each node...


# linstor node interface create ha52a-cl nic_core 192.168.8.100
# linstor node interface create ha52a-cl nic_repl 198.51.100.100

# linstor node interface create ha52b-cl nic_core 192.168.8.102
# linstor node interface create ha52b-cl nic_repl 198.51.100.102

The we set the preferred NIC for the storage pools...


# linstor storage-pool set-property ha52a-cl spool2 PrefNic nic_core

# linstor storage-pool set-property ha52b-cl spool2 PrefNic nic_core

Then we created the resource...

# linstor resource-group spawn-resources rgroup2 site436 603G

The resource was created successfully, but the .res file only shows 
communication happening on one NIC.

connection
{

disk
{
c-fill-target 1048576;
c-max-rate 2048000;
}
host ha52a.mycharts.md address ipv4 198.51.100.100:7415;
host ha52b.mycharts.md address ipv4 198.51.100.102:7415;
}

Per the DRBD 9 docs, there should be "path" entries like this...


resource site436 {

  ...

  connection {

path {

  host ha52a-cl address 192.168.8.100:7900;

  host ha52b-cl address 192.168.8.102:7900;

}

path {

  host ha52a-cl address 198.51.100.100:7900;

  host ha52b-cl address 198.51.100.102:7900;

}

  }

  ...

}

What did we miss? There seems to be a step missing in the documentation where 
it tells you how to make DRBD use both NICs and not just the preferred one.

-Eric






Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Running Different Versions of drbd / kmod-drbd on different nodes

2022-05-25 Thread Eric Robinson
Thanks, Roland.

> -Original Message-
> From: drbd-user-boun...@lists.linbit.com  boun...@lists.linbit.com> On Behalf Of Roland Kammerer
> Sent: Wednesday, May 25, 2022 1:58 AM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] Running Different Versions of drbd / kmod-drbd on
> different nodes
>
> On Tue, May 24, 2022 at 04:43:18PM +, Eric Robinson wrote:
> > If a cluster has 4 data nodes, and they are deployed in such a way
> > that Nodes 1 & 2 replicate only to each other, and Nodes 3 & 4
> > replicate only to each other, is it okay to run Nodes 1 & 2 on a
> > different version of drbd and kmod-drbd than nodes 3 & 4?
>
> Should be fine in general. I mean what if you upgrade nodes, and you
> upgrade 1 and then reconnect, then you also have 2 different versions.
> And that is a of course a supported upgrade scenario.  And in your case they
> don't even talk to each other. Anyways, when a resource connects to a peer
> there is a handshake and there min and max protocol versions get
> exchanged and as long as they have some common ground the connection
> get's established, otherwise aborted. Fun fact: there is current work that
> latest DRBD9 can even talk to 8.4 again.
>
> Regards, rck
> ___
> Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-
> u...@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Running Different Versions of drbd / kmod-drbd on different nodes

2022-05-24 Thread Eric Robinson
I should say a different sub-version, both within the drbd 9 series.

-Eric


From: drbd-user-boun...@lists.linbit.com  
On Behalf Of Eric Robinson
Sent: Tuesday, May 24, 2022 11:43 AM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] Running Different Versions of drbd / kmod-drbd on 
different nodes

If a cluster has 4 data nodes, and they are deployed in such a way that Nodes 1 
& 2 replicate only to each other, and Nodes 3 & 4 replicate only to each other, 
is it okay to run Nodes 1 & 2 on a different version of drbd and kmod-drbd than 
nodes 3 & 4?

-Eric



Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Running Different Versions of drbd / kmod-drbd on different nodes

2022-05-24 Thread Eric Robinson
If a cluster has 4 data nodes, and they are deployed in such a way that Nodes 1 
& 2 replicate only to each other, and Nodes 3 & 4 replicate only to each other, 
is it okay to run Nodes 1 & 2 on a different version of drbd and kmod-drbd than 
nodes 3 & 4?

-Eric



Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD Behavior on Device Failure, and Recovery Procedure

2022-04-28 Thread Eric Robinson
Our servers have a large number of resources on a 6-drive volume group. When 
Linstor provisioned the resources, it apparently kept them all on individual 
devices. Here's a snippet of the approximately 200 resources on the servers. 
None of them show more than 1 device in the "Devices" column.

[root@ha51b ~]# lvs -o+lv_layout,stripes,devices
  LVVGAttr   LSizePool Origin Data%  Meta%  Move Log 
Cpy%Sync Convert Layout #Str Devices
  site002_0 vg0   -wi-ao  104.02g   
  linear1 /dev/nvme3n1(371535)
  site003_0 vg0   -wi-ao  <63.02g   
  linear1 /dev/nvme0n1(498558)
  site017_0 vg0   -wi-ao <149.04g   
  linear1 /dev/nvme3n1(724396)
  site019_0 vg0   -wi-ao  <19.01g   
  linear1 /dev/nvme4n1(0)
  site021_0 vg0   -wi-ao  <23.01g   
  linear1 /dev/nvme2n1(698275)
  site030_0 vg0   -wi-ao   39.01g   
  linear1 /dev/nvme2n1(704165)
  site034_0 vg0   -wi-ao  <23.01g   
  linear1 /dev/nvme3n1(713896)
  site035_0 vg0   -wi-ao   39.01g   
  linear1 /dev/nvme0n1(254527)
  site036_0 vg0   -wi-ao  <88.02g   
  linear1 /dev/nvme2n1(714152)
  site037_0 vg0   -wi-ao  <28.01g   
  linear1 /dev/nvme0n1(530822)
  site039_0 vg0   -wi-ao  <59.02g   
  linear1 /dev/nvme1n1(180777)
  site041_0 vg0   -wi-ao  <21.01g   
  linear1 /dev/nvme3n1(181290)
  site043_0 vg0   -wi-ao   50.01g   
  linear1 /dev/nvme3n1(398165)
  site045_0 vg0   -wi-ao   52.01g   
  linear1 /dev/nvme1n1(203567)
  site047_0 vg0   -wi-ao   54.01g   
  linear1 /dev/nvme0n1(264514)
  site049_0 vg0   -wi-ao  <81.02g   
  linear1 /dev/nvme3n1(410968)
  site058_0 vg0   -wi-ao  <30.01g   
  linear1 /dev/nvme0n1(564622)
  site062_0 vg0   -wi-ao   17.00g   
  linear1 /dev/nvme3n1(197679)
  site065_0 vg0   -wi-ao  <23.01g   
  linear1 /dev/nvme1n1(387935)
  site068_0 vg0   -wi-ao  <32.01g   
  linear1 /dev/nvme0n1(616090)


With this layout (all LVs are linear), when a drive fails, I assume only the 
resources on that physical drive would go diskless, and all the other resources 
would continue operating normally, is that correct?

In such an event, what would be the recovery procedure? Swap the failed drive, 
use vgcfrestore to restore the LVM data to the new PV, then do a DRBD resync?

-Eric

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Adding a Physical Disk to a Storage Pool

2022-04-28 Thread Eric Robinson
> -Original Message-
> From: drbd-user-boun...@lists.linbit.com  boun...@lists.linbit.com> On Behalf Of Roland Kammerer
> Sent: Thursday, April 28, 2022 9:47 AM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] Adding a Physical Disk to a Storage Pool
>
> On Thu, Apr 28, 2022 at 02:41:53PM +, Eric Robinson wrote:
> > There are about 200 resources between the two nodes.  'systemctl
> > restart linstor-satellite' does not touch the running resources or
> > disrupt service, right?
>
> There are no interruptions. That is the lovely part when control and data
> plane are separated.
>

Turns out there was no need. I added a small resource as a test and the storage 
pool numbers now show correctly. You were right, thanks!

This does raise a related question, though. I only recently read the part of 
the Linstor User Guide where it talks about confining failure domains by 
creating one storage pool for each backend device (section 1.10.1). I wish I 
had known that sooner. When we created our volume groups, we did it like this...

# vgcreate  vg0 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 
/dev/nvme4n1 /dev/nvme5n1

It's all one big volume group and we let Linstor provision resources wherever 
it wants. What is the effect If one NVME drive fails? Do all resources on the 
node go diskless, or just resources that happen to live entirely on the failed 
drive?

> Regards, rck
> ___
> Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-
> u...@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Adding a Physical Disk to a Storage Pool

2022-04-28 Thread Eric Robinson


> -Original Message-
> From: drbd-user-boun...@lists.linbit.com  boun...@lists.linbit.com> On Behalf Of Roland Kammerer
> Sent: Thursday, April 28, 2022 1:41 AM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] Adding a Physical Disk to a Storage Pool
>
> On Thu, Apr 28, 2022 at 03:00:35AM +, Eric Robinson wrote:
> > > -Original Message-
> > > From: drbd-user-boun...@lists.linbit.com  > > boun...@lists.linbit.com> On Behalf Of Roland Kammerer
> > > Sent: Wednesday, April 27, 2022 2:07 AM
> > > To: drbd-user@lists.linbit.com
> > > Subject: Re: [DRBD-user] Adding a Physical Disk to a Storage Pool
> > >
> > > On Wed, Apr 27, 2022 at 04:52:24AM +, Eric Robinson wrote:
> > > > I've read through the Linstor User Guide and looked at Linstor's
> > > > many help screens, but I don't see an answer to this question. We
> > > > just added a new physical disk to our cluster nodes. What linstor
> > > > command adds them to existing storage pools? Or am I supposed to
> > > > add them manually to the underlying LVM volume group outside of
> Linstor?
> > >
> > > yes.
> >
> > I did that, and now vgdisplay  shows that the volume group's size has
> > increased by the expected amount, but the storage does not show in
> > linstor. The total and free capacities on the pools have not changed.
>
> not a LINSTOR dev, but AFAIK it only updates the information if it has to (and
> maybe periodically?), like when it needs to create a new resource. The
> easiest is to just 'systemctl restart linstor-satellite'
> on that host. There should also be some reconnect command in the client
> IIRC.
>

There are about 200 resources between the two nodes.  'systemctl restart 
linstor-satellite' does not touch the running resources or disrupt service, 
right?

> Regards, rck
> ___
> Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-
> u...@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Adding a Physical Disk to a Storage Pool

2022-04-27 Thread Eric Robinson
> -Original Message-
> From: drbd-user-boun...@lists.linbit.com  boun...@lists.linbit.com> On Behalf Of Roland Kammerer
> Sent: Wednesday, April 27, 2022 2:07 AM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] Adding a Physical Disk to a Storage Pool
>
> On Wed, Apr 27, 2022 at 04:52:24AM +, Eric Robinson wrote:
> > I've read through the Linstor User Guide and looked at Linstor's many
> > help screens, but I don't see an answer to this question. We just
> > added a new physical disk to our cluster nodes. What linstor command
> > adds them to existing storage pools? Or am I supposed to add them
> > manually to the underlying LVM volume group outside of Linstor?
>
> yes.

I did that, and now vgdisplay  shows that the volume group's size has increased 
by the expected amount, but the storage does not show in linstor. The total and 
free capacities on the pools have not changed.

[root@ha50a sa]# linstor physical-storage list
╭───╮
┊ Size ┊ Rotational ┊ Nodes ┊
╞═══╡
╰───╯
[root@ha50a sa]# linstor storage-pool l
╭──╮
┊ StoragePool  ┊ Node ┊ Driver   ┊ PoolName ┊ FreeCapacity ┊ 
TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊
╞══╡
┊ DfltDisklessStorPool ┊ ha50a-cl ┊ DISKLESS ┊  ┊  ┊
   ┊ False┊ Ok┊┊
┊ DfltDisklessStorPool ┊ ha50b-cl ┊ DISKLESS ┊  ┊  ┊
   ┊ False┊ Ok┊┊
┊ DfltDisklessStorPool ┊ ha51a-cl ┊ DISKLESS ┊  ┊  ┊
   ┊ False┊ Ok┊┊
┊ DfltDisklessStorPool ┊ ha51b-cl ┊ DISKLESS ┊  ┊  ┊
   ┊ False┊ Ok┊┊
┊ DfltDisklessStorPool ┊ quorum01 ┊ DISKLESS ┊  ┊  ┊
   ┊ False┊ Ok┊┊
┊ spool0   ┊ ha50a-cl ┊ LVM  ┊ vg0  ┊ 4.62 TiB ┊ 
17.47 TiB ┊ False┊ Ok┊┊
┊ spool0   ┊ ha50b-cl ┊ LVM  ┊ vg0  ┊ 4.62 TiB ┊ 
17.47 TiB ┊ False┊ Ok┊┊
┊ spool1   ┊ ha51a-cl ┊ LVM  ┊ vg0  ┊ 3.35 TiB ┊ 
17.47 TiB ┊ False┊ Ok┊┊
┊ spool1   ┊ ha51b-cl ┊ LVM  ┊ vg0  ┊ 3.34 TiB ┊ 
17.47 TiB ┊ False┊ Ok┊┊
╰

>
> Regards, rck
> ___
> Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-
> u...@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Adding a Physical Disk to a Storage Pool

2022-04-26 Thread Eric Robinson
I've read through the Linstor User Guide and looked at Linstor's many help 
screens, but I don't see an answer to this question. We just added a new 
physical disk to our cluster nodes. What linstor command adds them to existing 
storage pools? Or am I supposed to add them manually to the underlying LVM 
volume group outside of Linstor?
-Eric

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Do You Always Have to Upgrade/Rebuild DRBD Whenever You Upgrade the Kernel?

2022-04-25 Thread Eric Robinson
> -Original Message-
> From: drbd-user-boun...@lists.linbit.com  boun...@lists.linbit.com> On Behalf Of Roland Kammerer
> Sent: Monday, April 25, 2022 1:40 AM
> To: drbd-user@lists.linbit.com
> Cc: Roland Kammerer 
> Subject: Re: [DRBD-user] Do You Always Have to Upgrade/Rebuild DRBD
> Whenever You Upgrade the Kernel?
>
> On Fri, Apr 22, 2022 at 09:05:46PM +, Eric Robinson wrote:
> > Thanks for the feedback, Roland. I am actually a paying Linbit
> > customer, but I am unaware of the process to follow when upgrading the
> > OS. For example, I'm currently running Rocky Linux 8.5 (kernel
> > 4.18.0-348.2.1.el8_5.x86_64) and drbd 9.19.1-1.el8.
>
> The name of the meta package (i.e., "drbd") is IMO pretty unfortunate, that
> version refers to DRBD utils. It does not matter, can not remember when we
> did a breaking change in utils. What matters is kmod-drbd. But you most
> likely know that.
>

Indeed. We've been using DRBD since 2006. We're not experts, but we've picked 
up a few of the basics. 


> > There is a kernel
> > upgrade to 4.18.0-348.20.1.el8_5 available in the Rocky repo. If I
> > install that, does it break drbd?
>
> In 99.9% it does not break the binary module. That would only be the case if
> the vendor broke the kAbi. We usually have 1 binary module per RHEL dot
> release. So as long as you don't jump from like RHEL 8.4 to 8.5 or something
> then the same module should be fine. Also, you might enable a "dot
> repository" (one that has .../rhel8.5/... in the URL, not the global
> ".../rhel8/..." one that contains all the kernel modules for all RHEL (8.0, 
> 8.1,
> 8.2,...) if you are a customer. If you enabled the "dot repo" you should be
> fine taking the latest kernel + the latest kmod-drbd. If, and that is very
> unlikely, there are more than 1 kmod-drbd in a dot repo, use the one closest
> to your kernel version, usually the later one. Again, that usually is not the
> case, only if the kAbi was broken.
>
> Finding the best matching kernel module to a given kernel for all DRBD
> modules that exist is a problem we had to solve ourselves, we even have a
> public web service one can use like this:
> $ cat /etc/os-release | curl -T - -X POST drbd.io:3030/api/v1/best/$(uname -r)
> -s
>
> At the time of writing it answered with:
> kmod-drbd-9.1.7_4.18.0_348-1.x86_64.rpm
>
> If you are into RHEL versioning, you see that this is the first kernel-devel
> package in that kernel series. So, for 4.18 there has not been any breakage.
> Which again is the usual case, the people at Redhat know what they are
> doing. Most of the time :).
>
> Code for that web service and the underlying python library can be found
> here:
> https://github.com/LINBIT/bestdrbdmodule
> https://github.com/LINBIT/python-lbdist
>
> Regards, rck

This is all excellent information, thanks much.

> ___
> Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-
> u...@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Do You Always Have to Upgrade/Rebuild DRBD Whenever You Upgrade the Kernel?

2022-04-22 Thread Eric Robinson
> -Original Message-
> From: drbd-user-boun...@lists.linbit.com  boun...@lists.linbit.com> On Behalf Of Roland Kammerer
> Sent: Tuesday, April 19, 2022 1:55 AM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] Do You Always Have to Upgrade/Rebuild DRBD
> Whenever You Upgrade the Kernel?
>
> On Mon, Apr 18, 2022 at 09:43:38PM +, Eric Robinson wrote:
> > New kernel versions are released fairly often. What are the rules
> > about upgrading DRBD when that happens? Is it always necessary, or are
> > there certain thresholds within which it is okay to upgrade the kernel
> > without rebuilding DRBD? Rebuilding DRBD every time is pretty
> > disruptive.
>
> "depends", but in general that is how things work, there are no stable kABI
> guarantees from the Linux kernel itself. So you have to rebuild external
> modules. This is for example why dkms exists.
>
> In more detail it depends on the distribution. RHEL guarantees a stable kABI
> for certain symbols. Unfortunately sometimes even they make mistakes and
> break it, but one usually gets away with relatively few builds. Debian tries 
> to
> keep a stable kABI, Ubuntu does not seem to do so.
>
> Needless to say that we for example test for new RHEL kernels and compat
> and provide new binary modules for our customers. They just upgrade and
> don't have to worry. Same for Debian - we build when we need to - we build
> every Ubuntu kernel release. For the other distros it depends on their model
> (which roughly depends on how "good" their RHEL clone actually is). Even
> with rotating out old kernels rather quickly, we get
> 3 figure kernel numbers we build for easily.
>
> Regards, rck
> 

Thanks for the feedback, Roland. I am actually a paying Linbit customer, but I 
am unaware of the process to follow when upgrading the OS. For example, I'm 
currently running Rocky Linux 8.5 (kernel 4.18.0-348.2.1.el8_5.x86_64) and drbd 
9.19.1-1.el8. There is a kernel upgrade to 4.18.0-348.20.1.el8_5 available in 
the Rocky repo. If I install that, does it break drbd? If so, are you saying 
that I should do a rolling upgrade of drbd on my cluster?

___
> Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-
> u...@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Fwd: Linstor User Guide Problem

2022-04-22 Thread Eric Robinson
Hi Gabor,

To answer your question, “What are you trying to achieve?” …

Our DB clusters have 4 NICS in 2 bonds. There is a “front” bond attached to the 
client-facing network, plus a “back” bond attached to the replication network. 
Currently, DRBD is configured on all nodes to use the IP addresses associated 
with the back bonds. We have almost 400 resources already up and running. I 
recently learned that DRBD can be configured to use multiple network paths in 
an active/standby fashion. I want to enable a secondary path using the front 
bonds. I’m looking for the easier and most reliable way to accomplish that, so 
if something happens to the replication network, DRBD will seamlessly 
transition to using the front network.


From: drbd-user-boun...@lists.linbit.com  
On Behalf Of Gábor Hernádi
Sent: Tuesday, April 19, 2022 8:27 AM
To: drbd-user 
Subject: [DRBD-user] Fwd: Linstor User Guide Problem

Hello,

On Mon, Apr 18, 2022 at 3:08 AM Eric Robinson 
mailto:eric.robin...@psmnv.com>> wrote:
We noticed section 2.7 of the Linstor User Guide ends with the following 
statement, apparently intended for internal use:

“FIXME describe how to route the controller <-> client communication through a 
specific netif.”

Will that be fixed soon? It’s information we need to know.

Thank you for noting. We already removed the FIXME and will try to add that 
information soon.

Besides the options you can use for defining the controller's IP address or 
hostname (described in [1]), there is no way to tell the client through which 
NIC it should reach the specified controller, using `linstor` commands.
Please be aware that the client does not know about the configured "netif"s 
specified through `linstor node interface ...` (sounds a bit hard having to ask 
the controller on how to contact the very same controller :) )

What exactly are you trying to achieve here?

Does anyone here know if NIC assignment commands can be applied retroactively 
to resources that already exist? For example, if I issue the commands…


linstor node interface create alpha 100G_nic 192.168.43.221

linstor node interface create alpha 10G_nic 192.168.43.231

linstor storage-pool set-property alpha pool_hdd PrefNic 10G_nic

linstor storage-pool set-property alpha pool_ssd PrefNic 100G_nic

Yes that works, but it is not perfect right now. If you change the PrefNic for 
a storage pool, all resources using that storage pool will update accordingly 
but also simultaneously. That means that updating the PrefNic on a diskless 
storage pool (for a diskless resource) will briefly disconnect it from all 
diskful peers, which might cause some problems, especially if that diskless 
resource is primary.


…can I then modify existing resources that use the pool_hdd and pool_ssd 
storage pools and make them start using multiple paths?

Not sure if I understand this question correctly or not: If the question is 
"can I move a resource from one storage pool to another", then the answer is 
no. Only by deleting the resource and re-creating it into the new storage pool 
and letting DRBD resync the data.
If you meant that question if you can configure DRBD to use multiple network 
paths somehow, `PrefNic` is the wrong approach here. I just realized that we 
are missing that part in our documentation, which we will of course fix soon.
Until then, please feel free to explore the feature via the client's help 
messages:
   linstor resource-connection path create -h


[1] 
https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-using_the_linstor_client

--
Best regards,
Gabor Hernadi
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Do You Always Have to Upgrade/Rebuild DRBD Whenever You Upgrade the Kernel?

2022-04-18 Thread Eric Robinson
New kernel versions are released fairly often. What are the rules about 
upgrading DRBD when that happens? Is it always necessary, or are there certain 
thresholds within which it is okay to upgrade the kernel without rebuilding 
DRBD? Rebuilding DRBD every time is pretty disruptive.

-Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Linstor User Guide Problem

2022-04-17 Thread Eric Robinson
We noticed section 2.7 of the Linstor User Guide ends with the following 
statement, apparently intended for internal use:

"FIXME describe how to route the controller <-> client communication through a 
specific netif."

Will that be fixed soon? It's information we need to know.

Does anyone here know if NIC assignment commands can be applied retroactively 
to resources that already exist? For example, if I issue the commands...


linstor node interface create alpha 100G_nic 192.168.43.221

linstor node interface create alpha 10G_nic 192.168.43.231

linstor storage-pool set-property alpha pool_hdd PrefNic 10G_nic

linstor storage-pool set-property alpha pool_ssd PrefNic 100G_nic


...can I then modify existing resources that use the pool_hdd and pool_ssd 
storage pools and make them start using multiple paths?

-Eric




Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] 200+ Resources Created, and Now I See My Mistake

2022-03-30 Thread Eric Robinson
Hi Andrei,

> -Original Message-
> From: kvaps 
> Sent: Monday, March 28, 2022 2:50 PM
> To: Eric Robinson 
> Cc: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] 200+ Resources Created, and Now I See My
> Mistake
>
> Hi Eric,
>
> I have some experience with that.
> I think the easiest way would be to make a database backup and remove all `-
> repl` and `-REPL` suffixes from it.
> Then upload it back.
>

That sounds promising, I'll look into that.

-Eric
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] 200+ Resources Created, and Now I See My Mistake

2022-03-27 Thread Eric Robinson
We have created 200+ resources on our new a 5-node Linstor cluster. The 
database servers each have 4 NICs in 2 bonded sets. There's a bond on the 
"front" (client-facing) side of the server and a bond on the "back" 
(replication network) side of the server. When I built the cluster, I didn't 
know about NIC management. I wanted the servers to use their "back" bonds for 
DRBD traffic, so I created separate DNS names for the back bonds. For example, 
if a server is named "server-a," then the DNS name "server-a" resolves to the 
IP address of the front bond, and "server-a-repl" resolves to the IP address of 
the back bond.

When I created the cluster, I used node names server-a-repl, server-b-repl, 
server-c-repl, server-d-repl, because I thought that was the way to make sure 
DRBD used the back bonds. That was dumb. I should have left the node names 
alone and just created additional NICs and Linstor could have used PrefNic to 
assign the proper replication links.

So here's the question. Is there an easy, non-disruptive way to re-name cluster 
nodes and re-assign replication links without rebuilding almost everything from 
scratch?

-Eric





Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD Trim Support

2022-01-09 Thread Eric Robinson
According to the documentation, SSD TRIM/Discard support has been in DRBD since 
version 8. DRBD is supposed to detect if the underlying storage supports trim 
and, if so, automatically enable it. However, I am unable to TRIM my DRBD 
volumes.

[root@ha50a mysqld]# fstrim /ha
fstrim: /ha: the discard operation is not supported

It looks like DRBD disk is the only thing in the stack that does not have trim 
enabled...

[root@ha50a mysqld]# lsblk -D
NAMEDISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda04K   2G 0
├─sda1 04K   2G 0
├─sda2 04K   2G 0
└─sda3 04K   2G 0
  ├─ha50a-root 04K   2G 0
  └─ha50a-swap 04K   2G 0
nvme4n10  512B   2T 0
nvme2n10  512B   2T 0
nvme0n10  512B   2T 0
└─vg0-ha--001_00  512B   2T 0
  └─drbd1000   00B   0B 0
nvme3n10  512B   2T 0
nvme1n10  512B   2T 0
nvme5n10  512B   2T 0

I can TRIM those drives when I don't have DRBD in the stack.

Any ideas?

[cid:image001.png@01D803E2.22E588F0]

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD + ZFS

2021-08-31 Thread Eric Robinson
David --

That is good feedback and thanks much for the link. If I gather correctly, the 
thrust of the article is related to InnoDB optimization. Believe it or not, we 
employ a hybrid model. Each of our databases consists of approximately 5000 
tables of different sizes and structures. Most of them are still on MyISAM with 
only 20 or so on InnoDB. (In my experience over the past 15 years of hosting 
hundreds of MySQL databases, InnoDB is a bloated, fragile, resource-gulping 
freakshow, so we only use it for the handful of tables that demand it. That 
said, I realize most other people would see it differently.)

I hope you won't mind if I circle back and ask you some questions when the new 
servers get here and I start testing different approaches to storage.

> -Original Message-
> From: David Bruzos 
> Sent: Monday, August 30, 2021 6:26 AM
> To: Eric Robinson 
> Cc: ra...@isoc.org.il; drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] DRBD + ZFS
>
> Hi Eric,
> Sorry about the delay.  The article you provided is interesting, but 
> rather
> specific to a workload that would show rather dramatic results on VDO.  In
> your case, the main objective is making the most our of your NVME storage,
> while maintaining good performance.  The article would be very much
> applicable if you were doing replication over a slow WAN link or something
> like that, but I imagine that the network is not going to be a bottleneck for
> you, so saving throughput at the DRBD layer is probably not a big advantage.
> The real space and performance killer (if done wrong) in your case is 
> going
> to be proper block alignments to optimize the mysql workload.  Depending
> on your underlining storage optimal block size (usually 4KB) and the vdev
> type you want to use (EG. raidz, mirror), you will have to make sure that
> everything is optimized for mysql's 16KB writes.  As I pointed out earlier,
> mirror will be simplest/fastest and raidz is doable, but will be slower for
> writes (may not matter if you got enough iops).  The key is that with raidz,
> you will have to take more factors into account to ensure everything is
> optimal.  In my case for example, my newest setup uses raidz and
> compression for making the most our of my NVME, but I use ashift=9 (512
> byte blocks) to be able to make 4K zvols for my VMs and still greatly benefit
> from compression.
> It is important to point out that the raidz details are not unique to ZFS.
> Most people that use tradditional raid5 setups use it in a suboptimal manner
> and actually have terrible performance and either can't tell, or eventually
> move to raid10, because "raid5 sucks".  In any case, to answer your question,
> I would still use ZFS instead of VDO for multiple reasons and I would still 
> use it
> only under DRBD in this case.  You have a standard workload, so you should
> be able to optimize it to fit your objectives.
>
> Here is a good article about mysql on ZFS that should get you started:
>
> https://shatteredsilicon.net/blog/2020/06/05/mysql-mariadb-innodb-on-
> zfs/
>
>
> David
>
> --
> David Bruzos (Systems Administrator)
> Jacksonville Port Authority
> 2831 Talleyrand Ave.
> Jacksonville, FL  32206
> Cell: (904) 625-0969
> Office: (904) 357-3069
> Email: david.bru...@jaxport.com
>
> On Tue, Aug 24, 2021 at 09:26:22PM +, Eric Robinson wrote:
> > EXTERNAL
> > This message is from an external sender.
> > Please use caution when opening attachments, clicking links, and
> responding.
> > If in doubt, contact the person or the helpdesk by phone.
> > 
> >
> >
> > Hi David --
> >
> > Here is a link to a Linbit article about using DRBD with VDO. While the 
> > focus
> of this article is VDO, I assume the compression recommendation would
> apply to other technologies such as ZFS. As the article states, their goal was
> to compress data before it gets passed off to DRBD, because then DRBD
> replication is faster and more efficient. This was echoed in some follow-up
> conversation I had with a Linbit rep (or someone from Red Hat, I forget
> which).
> >
> > https://linbit.com/blog/albireo-virtual-data-optimizer-vdo-on-drbd/
> >
> > My use case is multi-tenant MySQL servers. I'll have 125+ separate
> instances of MySQL running on each cluster node, all out of separate
> directories and listening on separate ports. The instances will be divided 
> into
> 4 sets of 50, which live on 4 separate filesystems, on 4 separate DRBD disks.
> I've used this approach before very successfully with up to 60 MySQL
> instances, and now I'm dramatically increasing the server power and doubling
> the number of instances. 4 separat

Re: [DRBD-user] DRBD + ZFS

2021-08-24 Thread Eric Robinson
Hi David --

Here is a link to a Linbit article about using DRBD with VDO. While the focus 
of this article is VDO, I assume the compression recommendation would apply to 
other technologies such as ZFS. As the article states, their goal was to 
compress data before it gets passed off to DRBD, because then DRBD replication 
is faster and more efficient. This was echoed in some follow-up conversation I 
had with a Linbit rep (or someone from Red Hat, I forget which).

https://linbit.com/blog/albireo-virtual-data-optimizer-vdo-on-drbd/

My use case is multi-tenant MySQL servers. I'll have 125+ separate instances of 
MySQL running on each cluster node, all out of separate directories and 
listening on separate ports. The instances will be divided into 4 sets of 50, 
which live on 4 separate filesystems, on 4 separate DRBD disks. I've used this 
approach before very successfully with up to 60 MySQL instances, and now I'm 
dramatically increasing the server power and doubling the number of instances. 
4 separate DRBD threads will handle the replication. I'll be using 
corosync+pacemaker for the HA stack. I'd really like to compress the data and 
make the most of the available NVME media. The servers do not have RAID 
controllers. I'll be using ZFS, mdraid, or LVM to create 4 separate arrays for 
my DRBD backing disks.

--Eric

> -Original Message-
> From: David Bruzos 
> Sent: Tuesday, August 24, 2021 2:03 PM
> To: Eric Robinson 
> Cc: ra...@isoc.org.il; drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] DRBD + ZFS
>
> Hello Eric:
>
> > What degree of performance degradation have you observed with DRBD
> over ZFS? Our servers will be using NVME drives with 25Gbit networking:
>
> Unfortunately, I have not had the time to properly benchmark and
> compare a setup like yours with DRBD on top of ZFS.  Very superficial tests
> show that my I/O is more than sufficient for my workload, so I'm then more
> interested is the data integrity, snapshotting, compression, etc.  I would not
> want to create misinformation by sharing I/O stats that are not taking into
> account the many aspects of a proper ZFS benchmark and that are not being
> compared against an alternative setup.
> In the days of spinning rust storage, I always used mirrored vdevs, always
> added a fast ZIL, lots of RAM for ARC and a couple of caching devices for
> L2ARC, so the performance was great when compared with the alternatives.
>
> > Since you don't recommend having ZFS above DRBD, what filesystem do
> you use over DRBD?
>
> I've always had good results with XFS on LVM (very thin).  That 
> combination
> usually gives you good flexibility at the VM level and the performance is
> great.  These days, ext4 is a reasonable choice, but I still use XFS most of 
> the
> time.
> I would like to see what other folks think about the XFS+LVM combination
> for VMs vs something like ext4+LVM.
>
> > Linbit recommends that compression take place above DRBD rather than
> below. What are your thoughts about their recommendation versus your
> approach?
>
> If you can provide a link to their recommendation, I can be more specific.
> In any case, I'm sure their recommendation is reasonable depending on what
> your specific workload is.  In my case, I mostly use compression at the
> backing storage level, because it gives me a predictable and well understood
> VM environment where I can run a wide variety of guest operating systems,
> applications, workloads, etc, without having to worry about the specifics for
> each possible VM scenario.
> The reason I normally don't use ZFS for VMs is because I believe it best
> serves its purpose at the backing storage level for many reasons.  ZFS is
> designed to leverage lots of RAM for ARC, to handle the storage directly, to
> do many things with your hardware that are very much abstracted away at
> the guest level.
>
> What is your specific usage scenario?
>
>
> --
> David Bruzos (Systems Administrator)
> Jacksonville Port Authority
> 2831 Talleyrand Ave.
> Jacksonville, FL  32206
> Cell: (904) 625-0969
> Office: (904) 357-3069
> Email: david.bru...@jaxport.com
>
> On Tue, Aug 24, 2021 at 03:21:10PM +, Eric Robinson wrote:
> > EXTERNAL
> > This message is from an external sender.
> > Please use caution when opening attachments, clicking links, and
> responding.
> > If in doubt, contact the person or the helpdesk by phone.
> > 
> >
> >
> > Hi David --
> >
> > Thanks for your feedback! I do have a couple of follow-up
> questions/comments.
> >
> > What degree of performance degradation have you observed with DRBD
> over ZFS? Our servers will be using NVME drives wi

Re: [DRBD-user] DRBD + ZFS

2021-08-24 Thread Eric Robinson

Hi David --

Thanks for your feedback! I do have a couple of follow-up questions/comments.

What degree of performance degradation have you observed with DRBD over ZFS? 
Our servers will be using NVME drives with 25Gbit networking.
Since you don't recommend having ZFS above DRBD, what filesystem do you use 
over DRBD?
Linbit recommends that compression take place above DRBD rather than below. 
What are your thoughts about their recommendation versus your approach?

--Eric




> -Original Message-
> From: David Bruzos 
> Sent: Saturday, August 21, 2021 8:34 AM
> To: Eric Robinson 
> Cc: ra...@isoc.org.il; drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] DRBD + ZFS
>
> Hello folks,
> I've used DRBD over ZFS for many years and my experience has been very
> possitive.  My primary use case has been virtual machine backing storage for
> Xen hypervisors, with dom0 running ZFS and DRBD.  The realtime nature of
> DRBD replication allows for VM migrations, etc, and ZFS makes remote
> incremental backups awesome.  Overall, it is a combination that is hard to
> beat.
>
> * Key things to keep in mind:
>
> . The performance of DRBD on ZFS is not the best in the world, but the
> benefits of a properly configured and used setup far outweigh the
> performance costs.
> . If you are not limited buy storage size (typical when using rotating 
> disks), I
> would absolutely recommend mirror vdevs with ashift=12 for best results in
> most circumstances.
> . If space is a limiting factor (typical with SSD/NVME), I use raidz, but 
> careful
> considerations have to be made, so you don't end up wasting tuns of space,
> because of ashift/blocksize/striping issues.
> . Compression works great under the DRBD devices, but volblocksize/ashift
> details are extremely important to get the most out of it.
> . I would not create additional ZFS file systems on top of the DRBD 
> devices
> for compression or any other intensive feature, just not worth it, you want
> that as close to the physical storage as possible.
>
> I do run a few ZFS file systems on virtual machines that are backed by 
> DRBD
> devices on top of ZFS, but I am after other ZFS features in those cases.  The
> VMs running ZFS have compression=off, no vdev redundancy, optimized
> volblocksize for the situation/workload in question, etc.  My typical goto
> filesystem for VMs is XFS, because it is lean-and-mean and has the kind of
> features that everyone should want in a general purpose FS.
>
> If you have specific questions, let me know.
>
> David
>
> --
> David Bruzos (Systems Administrator)
> Jacksonville Port Authority
> 2831 Talleyrand Ave.
> Jacksonville, FL  32206
> Cell: (904) 625-0969
> Office: (904) 357-3069
> Email: david.bru...@jaxport.com
>
> On Fri, Aug 20, 2021 at 11:32:31AM +, Eric Robinson wrote:
> > EXTERNAL
> > This message is from an external sender.
> > Please use caution when opening attachments, clicking links, and
> responding.
> > If in doubt, contact the person or the helpdesk by phone.
> > 
> >
> > My main motivation is the desire for a compressed filesystem. I have
> experimented with using VDO for that purpose and it works, but the setup is
> complex and I don’t know if I trust it to work well when VDO is in a stack of
> Pacemaker cluster resources. If there a better way of getting compression to
> work above DRBD?
> >
> > -Eric
> >
> >
> > From: ra...@isoc.org.il 
> > Sent: Thursday, August 19, 2021 4:43 PM
> > To: Eric Robinson 
> > Cc: drbd-user@lists.linbit.com
> > Subject: Re: [DRBD-user] DRBD + ZFS
> >
> > Not sure ZFS is the right choice as an underline for a resource, it is
> > powerful but also complex (as a code base), which will probably will make it
> slow.
> >
> > unless you are going to expose the ZVOL or the dataset directly to be
> > consumed, stacking ZFS over DRBD over ZFS, seems to me as a bad idea.
> >
> >
> >
> > Rabin
> >
> >
> > On Wed, 18 Aug 2021 at 09:37, Eric Robinson
> mailto:eric.robin...@psmnv.com>> wrote:
> > I’m considering deploying DRBD between ZFS layers. The lowest layer
> RAIDZ will serve as the DRBD backing device. Then I would build another ZFS
> filesystem on top to benefit from compression. Any thoughs, experiences,
> opinions, positive or negative?
> >
> > --Eric
> >
> >
> >
> >
> >
> > Disclaimer : This email and any files transmitted with it are confidential 
> > and
> intended solely for intended recipients. If you are not the named addressee
> you shou

Re: [DRBD-user] DRBD + ZFS

2021-08-20 Thread Eric Robinson
My main motivation is the desire for a compressed filesystem. I have 
experimented with using VDO for that purpose and it works, but the setup is 
complex and I don’t know if I trust it to work well when VDO is in a stack of 
Pacemaker cluster resources. If there a better way of getting compression to 
work above DRBD?

-Eric


From: ra...@isoc.org.il 
Sent: Thursday, August 19, 2021 4:43 PM
To: Eric Robinson 
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] DRBD + ZFS

Not sure ZFS is the right choice as an underline for a resource,
it is powerful but also complex (as a code base), which will probably will make 
it slow.

unless you are going to expose the ZVOL or the dataset directly to be consumed,
stacking ZFS over DRBD over ZFS, seems to me as a bad idea.



Rabin


On Wed, 18 Aug 2021 at 09:37, Eric Robinson 
mailto:eric.robin...@psmnv.com>> wrote:
I’m considering deploying DRBD between ZFS layers. The lowest layer RAIDZ will 
serve as the DRBD backing device. Then I would build another ZFS filesystem on 
top to benefit from compression. Any thoughs, experiences, opinions, positive 
or negative?

--Eric





Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com<mailto:drbd-user@lists.linbit.com>
https://lists.linbit.com/mailman/listinfo/drbd-user
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD + ZFS

2021-08-19 Thread Eric Robinson
> -Original Message-
> From: Emmanuel Florac 
> Sent: Thursday, August 19, 2021 8:16 AM
> To: Eric Robinson 
> Cc: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] DRBD + ZFS
>
> Le Wed, 18 Aug 2021 03:39:01 +
> Eric Robinson  écrivait:
>
> > I'm considering deploying DRBD between ZFS layers. The lowest layer
> > RAIDZ will serve as the DRBD backing device. Then I would build
> > another ZFS filesystem on top to benefit from compression. Any
> > thoughs, experiences, opinions, positive or negative?
>
> But isn't ZFS implementing its own replication layer? Why go the DRBD route
> instead?
>

I'm not well read on ZFS, but I believe ZFS uses periodic scheduled 
replication, not real-time block-level replication. ZFS does not have an 
equivalent to DRBD protocol C. Am I mistaken?

--Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD + ZFS

2021-08-18 Thread Eric Robinson
I'm considering deploying DRBD between ZFS layers. The lowest layer RAIDZ will 
serve as the DRBD backing device. Then I would build another ZFS filesystem on 
top to benefit from compression. Any thoughs, experiences, opinions, positive 
or negative?

--Eric





Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] FW: Getting snapshotting working with VDO: a recent experience.

2021-07-06 Thread Eric Robinson
For anyone else who is considering using VDO over DRBD. This is what we 
discovered about getting it to work.

From: Sweet Tea Dorminy 
Sent: Tuesday, July 6, 2021 1:50 PM
To: vdo-devel 
Subject: Getting snapshotting working with VDO: a recent experience.

Recently I had the pleasure of helping someone figure out how to use VDO with 
their snapshotting solution, and figured I'd send out a summary, with their 
permission, to the list in case it helps anyone else.
Setup: Eric was using DRBD for data replication, and VDO on top to provide data 
reduction, similarly to Linbit's 
article. 
The stack looked like this:
Physical -> LVM -> DRBD -> VDO -> FileSystem
Specifically:
/dev/sda
/dev/vg/lv
/dev/drbd0
/dev/vdo0
/my-filesystem 
(xfs)

Problem: Eric wanted to add snapshotting to the mix, taking a snapshot and 
verifying that it matched the filesystem. Without VDO in the stack, this is a 
simple matter of taking a snapshot of /dev/vg/lv and mounting the snapshot. 
With VDO, though, how to get from /dev/vg/lv-snap to a verifiable filesystem?
- 'vdo create --device=/dev/vg/lv-snap' doesn't work -- it formats a new VDO, 
and complains if it finds an already existing vdo.
- 'vdo import' doesn't work -- it complains there's already a VDO with the same 
UUID on the system.
- just mounting /dev/vg/lv-snap doesn't work, as it is a vdo, not a filesystem.

Solving: In order to start a VDO from storage containing an already formatted 
VDO, 'vdo import' is necessary, but as said, it reported a UUID collision. 
After some searching, we realized passing "--uuid=''" (the empty string) to vdo 
import would import the snapshot VDO and also change the UUID, so it wouldn't 
collide with the original VDO.

For instance:
[sweettea@localhost ~]$ sudo vdo create --name vdo0 --device 
/dev/fedora_localhost-live/vms
Creating VDO vdo0
  Logical blocks defaulted to 6802574 blocks.
  The VDO volume can address 26 GB in 13 data slabs, each 2 GB.
  It can grow to address at most 16 TB of physical storage in 8192 slabs.
  If a larger maximum size might be needed, use bigger slabs.
Starting VDO vdo0
Starting compression on VDO vdo0
VDO instance 0 volume is ready at /dev/mapper/vdo0
[sweettea@localhost ~]$ sudo lvcreate --size 2G --snapshot --name vms_snap 
/dev/fedora_localhost-live/vms
  Logical volume "vms_snap" created.
[sweettea@localhost ~]$ sudo vdo import --name vdo0-snap --device 
/dev/fedora_localhost-live/vms_snap
Importing VDO vdo0-snap
vdo: ERROR - UUID df61bc46-13dc-4091-bdec-4896c744888c already exists in VDO 
volume(s) stored on 
/dev/disk/by-id/dm-uuid-LVM-Gexh1cit2vwmcvIf2AAvullg3mWvrnql0SluovtpLXaYGoyPw84zRD4NPmGO3Nu4
[sweettea@localhost ~]$ sudo vdo import --name vdo0-snap --device 
/dev/fedora_localhost-live/vms_snap --uuid
usage: vdo import [-h] -n  --device  [--activate 
{disabled,enabled}] [--blockMapCacheSize ] [--blockMapPeriod 
] [--compression {disabled,enabled}]
  [--deduplication {disabled,enabled}] [--emulate512 
{disabled,enabled}] [--maxDiscardSize ] [--uuid ] 
[--vdoAckThreads ]
  [--vdoBioRotationInterval ] [--vdoBioThreads 
] [--vdoCpuThreads ] [--vdoHashZoneThreads 
]
  [--vdoLogicalThreads ] [--vdoLogLevel 
{critical,error,warning,notice,info,debug}] [--vdoPhysicalThreads ]
  [--writePolicy {async,async-unsafe,sync,auto}] [-f ] 
[--logfile ] [--verbose]
vdo import: error: argument --uuid: expected one argument
[sweettea@localhost ~]$ sudo vdo import --name vdo0-snap --device 
/dev/fedora_localhost-live/vms_snap --uuid ''
Importing VDO vdo0-snap
Starting VDO vdo0-snap
Starting compression on VDO vdo0-snap
VDO instance 1 volume is ready at /dev/mapper/vdo0-snap

(The documentation on --uuid was somewhat confusing, but passing an empty 
string worked.)
However, mounting the resulting snapshot still didn't work.
'journalctl -K' showed these logs from a different attempt:
[3714073.665039] kvdo3:dmsetup: underlying device, REQ_FLUSH: supported, 
REQ_FUA: not supported
[3714073.665041] kvdo3:dmsetup: Using write policy async automatically.
[3714073.665042] kvdo3:dmsetup: loading device 'snap_vdo0'
[3714073.665055] kvdo3:dmsetup: zones: 1 logical, 1 physical, 1 hash; base 
threads: 5
[3714073.724597] kvdo3:dmsetup: starting device 'snap_vdo0'
[3714073.724607] kvdo3:journalQ: Device was dirty, rebuilding reference counts
[3714074.025715] kvdo3:journalQ: Finished reading recovery journal
[3714074.032016] kvdo3:journalQ: Highest-numbered recovery journal block has 
sequence number 4989105, and the highest-numbered usable block is 4989105
[3714074.337654] kvdo3:physQ0: Replaying entries into slab journals for zone 0
[3714074.536446] kvdo3:physQ0: Recreating missing journal entries for zone 0
[3714074.536503] 

Re: [DRBD-user] mounting drbd snapshots that contain vdo devices

2021-06-24 Thread Eric Robinson
Bump.

Anybody have experience with vdo on drbd?


From: drbd-user-boun...@lists.linbit.com  
On Behalf Of Eric Robinson
Sent: Sunday, June 20, 2021 9:21 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] mounting drbd snapshots that contain vdo devices

I have a filesystem directly on a drbd disk, and the drbd disk has an LVM 
volumes as a backing device. It looks like this...


lvm volume
drbd disk
ext4 filesystem


I can snapshot the backing device, mount it, and explore the filesystems.

However, I recently started experimenting with vdo, so now the stack looks like 
this

lvm volume
drbd disk
vdo disk
ext4 filesystem

When I snapshot the backing device, I can no longer mount it because it 
complains that vdo is an unknown filesystem type. Any ideas how I can mount the 
vdo disk from the snapshot?

-Eric





[cid:image001.png@01D76840.1F9E1B50]

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] 20TB cluster: 1 big drbd disk or multiple small ones?

2021-06-21 Thread Eric Robinson
> -Original Message-
> From: drbd-user-boun...@lists.linbit.com  boun...@lists.linbit.com> On Behalf Of Matt Kereczman
> Sent: Monday, June 21, 2021 1:09 PM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] 20TB cluster: 1 big drbd disk or multiple small ones?
>
> On 6/21/21 3:19 AM, Eric Robinson wrote:
> > Suppose you're building a cluster with 20TB of storage that will run
> > multiple instances of MySQL. Which is better, creating one big drbd
> > disk, or carving it up into multiple smaller ones?
>
> Many small DRBD resources is likely better than one large resource in this
> case. With multiple resources you'll have independent failure domains as
> opposed to one large one.

My mind is leaning in the same direction.

>Also, each DRBD resource will get it's own set of
> processes (worker/sender/receiver/etc.) and network socket which will
> likely help with performance.
>

Hmm. Good thought. Had not considered that.

> You could even use, "options { cpu-mask $mask; }", in your DRBD
> configurations to ensure each resource gets it's own CPU thread. Here is a
> link to the relevant section in the DRBD users guide:
> https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-latency-tuning-
> cpu-mask
>
> >
> > Also, any thoughts about a preferred filesystem for this scenario?
> 
> Any journaling FS (xfs/ext4) mounted with noatime should be fine.
>

Not xfs then?

> Best Regards,
> Matt
> ___
> Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-
> u...@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] 20TB cluster: 1 big drbd disk or multiple small ones?

2021-06-21 Thread Eric Robinson
Suppose you're building a cluster with 20TB of storage that will run multiple 
instances of MySQL. Which is better, creating one big drbd disk, or carving it 
up into multiple smaller ones?

Also, any thoughts about a preferred filesystem for this scenario?

-Eric






Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] mounting drbd snapshots that contain vdo devices

2021-06-20 Thread Eric Robinson
I have a filesystem directly on a drbd disk, and the drbd disk has an LVM 
volumes as a backing device. It looks like this...


lvm volume
drbd disk
ext4 filesystem


I can snapshot the backing device, mount it, and explore the filesystems.

However, I recently started experimenting with vdo, so now the stack looks like 
this

lvm volume
drbd disk
vdo disk
ext4 filesystem

When I snapshot the backing device, I can no longer mount it because it 
complains that vdo is an unknown filesystem type. Any ideas how I can mount the 
vdo disk from the snapshot?

-Eric





[cid:image001.png@01D76618.17667D80]

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] The Problem of File System Corruption w/DRBD

2021-06-07 Thread Eric Robinson
> -Original Message-
> From: Gionatan Danti 
> Sent: Sunday, June 6, 2021 11:02 AM
> To: Eric Robinson 
> Cc: Robert Altnoeder ; drbd-
> u...@lists.linbit.com
> Subject: Re: [DRBD-user] The Problem of File System Corruption w/DRBD
>
> Il 2021-06-04 15:08 Eric Robinson ha scritto:
> > Those are all good points. Since the three legs of the information
> > security triad are confidentiality, integrity, and availability, this
> > is ultimately a security issue. We all know that information security
> > is not about eliminating all possible risks, as that is an
> > unattainable goal. It is about mitigating risks to acceptable levels.
> > So I guess it boils down to how each person evaluates the risks in
> > their own environment. Over my 38-year career, and especially the past
> > 15 years of using Linux HA, I've seen more filesystem-type issues than
> > the other possible issues you mentioned, so that one tends to feature
> > more prominently on my risk radar.
>
> For the very limited goal of protecting from filesystem corruptions, you can
> use a snapshot/CoW layer as thinlvm. Keep multiple rolling snapshots and
> you can recover from sudden filesystem corruption. However this is simply
> move the SPOF down to the CoW layer (thinlvm, which is quite complex by
> itself and can be considered a stripped-down
> filesystem/allocator) or up to the application layer (where corruptions are
> relatively quite common).
>
> That said, nowadays a mature filesystem as EXT4 and XFS can be corrupted
> (barring obscure bugs) only by:
> - a double mount from different machines;
> - a direct write to the underlying raw disks;
> - a serious hardware issue.
>
> For what it is worth I am now accustomed to ZFS strong data integrity
> guarantee, but I fully realize that this does *not* protect from any
> corruptions scenario by itself, not even on XFS-over-ZVOL-over-DRBD-over-
> ZFS setups. If anything, a more complex filesystem (and I/O setup) has
> *greater* chances of exposing uncommon bugs.
>
> So: I strongly advise on placing your filesystem over a snapshot layer, but do
> not expect this to shield from any storage related issue.
> Regards.
>

That would require a model where DRBD is sandwiched between two LVM layers. 
First, the DRBD backing device is an LVM partition. Then we create an LVM 
partition on top of DRBD, and create our filesystem on top of that. I've tried 
that approach before and had very poor success with cluster failover, due to 
the LVM resource agent not working as expected, volumes going inactive when 
they should be active, etc. Maybe it's just too complex for my brain. 

If rolling snapshots are an acceptable solution, why not just periodically 
snapshot the whole drbd volume? Then, in the unlikely event that filesystem 
corruption occurs, fall back to the snapshot from before the corruption 
happened. I assume that would require a full drbd resync from primary to 
secondary, but that's probably easier than restoring from backup media.
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] The Problem of File System Corruption w/DRBD

2021-06-04 Thread Eric Robinson
> -Original Message-
> From: drbd-user-boun...@lists.linbit.com  boun...@lists.linbit.com> On Behalf Of Robert Altnoeder
> Sent: Friday, June 4, 2021 6:15 AM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] The Problem of File System Corruption w/DRBD
>
> On 03 Jun 2021, at 21:41, Eric Robinson  wrote:
> >
> > It's a good thing that DRBD faithfully replicates whatever is passed to it.
> However, since that is true, it does tend to enable the problem of filesystem
> corruption taking down a whole cluster. I'm just asking people for any
> suggestions they may have for alleviating that problem. If it’s not fixable,
> then it’s not fixable.
> >
> > Part of the reason I’m asking is because we’re about to build a whole new
> data center, and after 15 years of using DRBD we are beginning to look at
> other HA options, mainly because of the filesystem as a weak point. I should
> mention that it has *never* happened before, but the thought of it is scary.
>
> Oh, you’ve opened that can of worms, one of my favorite topics ;)
>
> I guess, I have bad news for you, because you have only just found the
> entrance to that rabbit hole. There are *lots* of things that can take down
> your entire cluster, and the filesystem is probably the least of your concerns
> here, so I think you’re looking at the wrong thing here. Unfortunately, none
> of them can be fixed by high-availability, because the problem area that you
> are talking about is not high-availability, it’s high-reliability.
>
> Let me give you a few examples on why high-reliability is something
> completely different than high-availability:
>
> 1. Imagine your application ends up in a corrupted state, but keeps running.
> Pacemaker might not even see that - the monitoring possibly just sees that
> the application is still running, so the cluster does not see any need to do
> anything, but the application does not work anymore.
>
> 2. Imagine your application crashes and leaves its data behind in a corrupted
> state in a file on a perfectly good filesystem - e.g., crashes after having
> written only 20% of the file’s content. Now Pacemaker restarts the
> application, but due to the corrupted content in its data file, the 
> application
> cannot start. Pacemaker migrates the application to another node, which
> obviously - due to synchronous replication - has the sama data. The
> application cannot start there. The whole game continues until Pacemaker
> runs out of nodes to try and start the application, because it doesn’t work
> anywhere.
>
> 3. Even worse, there could be a bug hidden in Pacemaker or Corosync that
> crashes the cluster software on all nodes at the same time, so that high-
> availability is lost. Then, your application crashes. Nothing’s there to 
> restart it
> anywhere.
>
> 4. Ultimate worst case: there could be a bug in the Linux kernel, especially
> somewhere in the network or I/O stack, that crashes all nodes
> simultaneously - especially on operations, where all of the nodes are doing
> the same thing, which is not that atypical for clusters - e.g., repliaction 
> to all
> nodes, or distributed locking, etc.
> It’s not even that unlikely.
>
> You might be shocked to hear that it has already happened to me - while
> developing or testing/experimenting, e.g. with experimental code. I have
> even crashed all nodes of an 8 node cluster simultaneously, and not just
> once. I have also had cases where my cluster fenced all its nodes.
> It’s not impossible - BUT it’s also not common on a well-tested production
> system that doesn’t continuously run tests of crazy corner cases like I do on
> my test systems.
>
> Obviously, adding more nodes does not solve any of those problems. But the
> real question is whether your use case is so critical that you really need to
> prevent any of those from occuring once (because those don’t seem to
> happen that often, otherwise we would have heard about it).
>
> If it’s really that level of critical, then you’re running the wrong 
> hardware, the
> wrong operating system and the wrong applications, and what you’re really
> looking for is a custom-designed high-reliability (not just high-availability)
> solution, with dissimilar hardware platforms, multiple independent code
> implementations, formally verified software design and implementation, etc.
> - like the ones used for special purpose medical equipment, safety-critical
> industrial equipment, avionics systems, nuclear reactor control, etc. - you 
> get
> the idea. Now you know why those aren’t allowed run on general-purpose
> hardware and software.
>

Those are all good points. Since the three legs of the information security 
triad are confidentiality, integr

Re: [DRBD-user] The Problem of File System Corruption w/DRBD

2021-06-03 Thread Eric Robinson
I guess I need to reiterate that I’ve been using DRBD in production clusters 
since 2006 and have been extremely satisfied happy with it. The purpose of my 
question is not to cast doubt or blame on DRBD for doing its job well. It's a 
good thing that DRBD faithfully replicates whatever is passed to it. However, 
since that is true, it does tend to enable the problem of filesystem corruption 
taking down a whole cluster. I'm just asking people for any suggestions they 
may have for alleviating that problem. If it’s not fixable, then it’s not 
fixable.



Part of the reason I’m asking is because we’re about to build a whole new data 
center, and after 15 years of using DRBD we are beginning to look at other HA 
options, mainly because of the filesystem as a weak point. I should mention 
that it has *never* happened before, but the thought of it is scary.



-Eric




From: drbd-user-boun...@lists.linbit.com  
On Behalf Of Yanni M.
Sent: Thursday, June 3, 2021 2:21 PM
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] The Problem of File System Corruption w/DRBD

As others already mentioned the job of DRBD is to faithfully and accurately 
replicate the data from the layers above it. So if there's a corruption on the 
filesystem above the DRBD layer then it will happily do it for you, same way as 
RAID1  would do it on a pair of hdds. If you want to reduce the recovery time 
from such situation then you could leverage from the snapshots capability on 
the layers below DRBD (if ThinLVM or ZFS are used), to rollback at a previous 
checkpoint or implement HA at the layers above DRBD if the application you are 
using supports it, it really depends on the use case. That being said a 
filesystem corruption shouldn't be a common thing and if it occurs you should 
investigate why it happened in the first place.



On Wed, 2 Jun 2021 at 22:50, Eric Robinson 
mailto:eric.robin...@psmnv.com>> wrote:
Since DRBD lives below the filesystem, if the filesystem gets corrupted, then 
DRBD faithfully replicates the corruption to the other node. Thus the 
filesystem is the SPOF in an otherwise shared-nothing architecture. What is the 
recommended way (if there is one) to avoid the filesystem SPOF problem when 
clusters are based on DRBD?

-Eric




Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com<mailto:drbd-user@lists.linbit.com>
https://lists.linbit.com/mailman/listinfo/drbd-user
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] The Problem of File System Corruption w/DRBD

2021-06-03 Thread Eric Robinson
> -Original Message-
> From: drbd-user-boun...@lists.linbit.com  boun...@lists.linbit.com> On Behalf Of Digimer
> Sent: Thursday, June 3, 2021 11:43 AM
> To: Robert Sander ; drbd-
> u...@lists.linbit.com
> Subject: Re: [DRBD-user] The Problem of File System Corruption w/DRBD
>
> On 2021-06-03 11:09 a.m., Robert Sander wrote:
> > Hi,
> >
> > Am 03.06.21 um 14:50 schrieb Eric Robinson:
> >
> >> Yes, thanks, I've said for many years that HA is not a replacement for
> disaster recovery. Still, it is better to avoid downtime than to recover from 
> it,
> and one of the main ways to achieve that is through redundancy, preferably
> a shared-nothing approach. If I have a cool 5-node cluster and the whole
> thing goes down because the filesystem gets corrupted, I can restore from
> backup, but management is going to wonder why a 5-node cluster could not
> provide availability. So the question remains: how to eliminate the filesystem
> as the SPOF?
> >
> > Then eliminate the shared filesystem and replicate data on application
> > level.
> >
> > - MySQL has Galera
> > - Dovecot has dsync
> >
> > Regards
>
> Even this approach just moves the SPOF up from the FS to the SQL engine.
>
> The problem here is that you're still confusing redundancy with data
> integrity. To avoid data corruption, you need a layer that understands your
> data at a sufficient level to know what corruption looks like. Data integrity 
> is
> yet another topic, and still separate from HA.
>

> DRBD, and other HA tools, don't analyze the data, and nor should they
> (imagine the security and privacy concerns that would open up). If the HA
> layer is given data to replicate, it's job is to faithfully and accurately 
> replicate
> the data.
>

It seems like the two are sometimes intertwined. If GFS2, for example, about 
integrity or redundancy? But I'm not really asking how to prevent filesystem 
corruption. I'm asking (perhaps stupidly) the best/easiest way to make a 
filesystem redundant.

> I think the real solution is not technical, it's expectations management. Your
> managers need to understand what each part of their infrastructure does
> and does not do. This way, if the concerns around data corruption are
> sufficient, they can invest in tools to protect the data integrity at the 
> logical
> layer.
>
> HA protects against component failure. That's it's job, and it does it well,
> when well implemented.
>

The filesystem is not a hardware component, but it is a cluster resource. The 
other cluster resources are redundant, with that sole exception. I'm just 
looking for a way around that problem. If there isn't one, then there isn't.

> --
> Digimer
> Papers and Projects: https://alteeve.com/w/ "I am, somehow, less
> interested in the weight and convolutions of Einstein’s brain than in the near
> certainty that people of equal talent have lived and died in cotton fields and
> sweatshops." - Stephen Jay Gould
> ___
> Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-
> u...@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] The Problem of File System Corruption w/DRBD

2021-06-03 Thread Eric Robinson
> -Original Message-
> From: drbd-user-boun...@lists.linbit.com  boun...@lists.linbit.com> On Behalf Of Eddie Chapman
> Sent: Thursday, June 3, 2021 1:11 PM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] The Problem of File System Corruption w/DRBD
>
> On 03/06/2021 13:50, Eric Robinson wrote:
> >> -Original Message-
> >> From: Digimer 
> >> Sent: Wednesday, June 2, 2021 7:23 PM
> >> To: Eric Robinson ;
> >> drbd-user@lists.linbit.com
> >> Subject: Re: [DRBD-user] The Problem of File System Corruption w/DRBD
> >>
> >> On 2021-06-02 5:17 p.m., Eric Robinson wrote:
> >>> Since DRBD lives below the filesystem, if the filesystem gets
> >>> corrupted, then DRBD faithfully replicates the corruption to the
> >>> other node. Thus the filesystem is the SPOF in an otherwise
> >>> shared-nothing
> >> architecture.
> >>> What is the recommended way (if there is one) to avoid the
> >>> filesystem SPOF problem when clusters are based on DRBD?
> >>>
> >>> -Eric
> >>
> >> To start, HA, like RAID, is not a replacement for backups. That is
> >> the answer to a situation like this... HA (and other availability
> >> systems like RAID) protect against component failure. If a node
> >> fails, the peer recovers automatically and your services stay online.
> >> That's what DRBD and other HA solutions strive to provide; uptime.
> >>
> >> If you want to protect against corruption (accidental or intentional,
> >> a-la cryptolockers), you need a robust backup system to _compliment_
> >> your HA solution.
> >>
> >
> > Yes, thanks, I've said for many years that HA is not a replacement for
> disaster recovery. Still, it is better to avoid downtime than to recover from 
> it,
> and one of the main ways to achieve that is through redundancy, preferably
> a shared-nothing approach. If I have a cool 5-node cluster and the whole
> thing goes down because the filesystem gets corrupted, I can restore from
> backup, but management is going to wonder why a 5-node cluster could not
> provide availability. So the question remains: how to eliminate the filesystem
> as the SPOF?
> >
>
> Some of the things being discussed here have nothing to do with drbd.
> drbd provides a raw block level device. It knows nothing about nor cares
> what layers you place above it, whether they be filesystems or some other
> block layer such as LVM or bcache.
>
> It does a very specific job; ensure the blocks you write to a drbd device get
> replicated and stored in real time on one or more other distributed hosts. If
> you write a 512byte size block of random garbage to a drbd device it will (and
> should) write the exact same garbage to the other distributed hosts too, so
> that if you read that same 512byte block back from any 1 of those individual
> hosts, you'll get the exact same garbage back.
>
> The OP stated "if the filesystem gets corrupted, then DRBD faithfully
> replicates the corruption to the other node." Good! That's exactly what we
> want it to do. What we definitely do NOT want is for drbd to manipulate the
> block data given to it in any way whatsoever, we want it to faithfully 
> replicate
> this.

No need to defend DRBD. We've been using it in production clusters since 2006 
and have been phenomenally happy with it. I'm not indicting DRBD at all. Yes, 
it's good that it faithfully replicates whatever is passed to it. However, 
since that is true, it does tend to enable the problem of filesystem corruption 
taking down a whole cluster. I'm just asking people for any suggestions they 
may have for alleviating that problem.

-Eric



Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] The Problem of File System Corruption w/DRBD

2021-06-03 Thread Eric Robinson
> -Original Message-
> From: Digimer 
> Sent: Wednesday, June 2, 2021 7:23 PM
> To: Eric Robinson ; drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] The Problem of File System Corruption w/DRBD
>
> On 2021-06-02 5:17 p.m., Eric Robinson wrote:
> > Since DRBD lives below the filesystem, if the filesystem gets
> > corrupted, then DRBD faithfully replicates the corruption to the other
> > node. Thus the filesystem is the SPOF in an otherwise shared-nothing
> architecture.
> > What is the recommended way (if there is one) to avoid the filesystem
> > SPOF problem when clusters are based on DRBD?
> >
> > -Eric
>
> To start, HA, like RAID, is not a replacement for backups. That is the answer
> to a situation like this... HA (and other availability systems like RAID) 
> protect
> against component failure. If a node fails, the peer recovers automatically
> and your services stay online. That's what DRBD and other HA solutions strive
> to provide; uptime.
>
> If you want to protect against corruption (accidental or intentional, a-la
> cryptolockers), you need a robust backup system to _compliment_ your HA
> solution.
>

Yes, thanks, I've said for many years that HA is not a replacement for disaster 
recovery. Still, it is better to avoid downtime than to recover from it, and 
one of the main ways to achieve that is through redundancy, preferably a 
shared-nothing approach. If I have a cool 5-node cluster and the whole thing 
goes down because the filesystem gets corrupted, I can restore from backup, but 
management is going to wonder why a 5-node cluster could not provide 
availability. So the question remains: how to eliminate the filesystem as the 
SPOF?

-Eric
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] The Problem of File System Corruption w/DRBD

2021-06-02 Thread Eric Robinson
Since DRBD lives below the filesystem, if the filesystem gets corrupted, then 
DRBD faithfully replicates the corruption to the other node. Thus the 
filesystem is the SPOF in an otherwise shared-nothing architecture. What is the 
recommended way (if there is one) to avoid the filesystem SPOF problem when 
clusters are based on DRBD?

-Eric




Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD + VDO HowTo?

2021-05-13 Thread Eric Robinson
Can anyone point to a document on how to use VDO de-duplication with DRBD? 
Linbit has a blog page about it, but it was last updated 6 years ago and the 
embedded links are dead.

https://linbit.com/blog/albireo-virtual-data-optimizer-vdo-on-drbd/

-Eric




Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Remove DRBD w/out Data Loss

2020-09-01 Thread Eric Robinson
Yannis –

Here’s what I don’t understand.

The backing device is logical volume: /dev/vg1/lv1

The drbd volume is: drbd0

The filesystem is ext4 on /dev/drbd0

Since the filesystem is built on /dev/drbd0, not on /dev/vg1/lv1, if we remove 
drbd from the mix, can we then simply do:

# mount /dev/vg1/lv1 /mnt and find all the data there?

--Eric


From: Yannis Milios 
Sent: Tuesday, September 1, 2020 4:47 AM
To: Eric Robinson 
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Remove DRBD w/out Data Loss

You mean completely removing DRBD while preserving the data on its backing 
device ? That should work out of the box, without any extra effort, as DRBD 
works as a transparent layer and it does not modify the data on the backing 
device.

Yannis




On Mon, 31 Aug 2020 at 09:39, Eric Robinson 
mailto:eric.robin...@psmnv.com>> wrote:













I may have missed this answer when I checked the manual, but we need to convert 
a DRBD cluster node into standalone server without losing the data. Is that 
possible?



--Eric









Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this

email are solely those of the author and might not represent those of Physician 
Select Management. Warning: Although Physician Select Management has taken 
reasonable precautions to ensure no viruses are present in this email, the 
company cannot accept responsibility

for any loss or damage arising from the use of this email or attachments.




___

Star us on GITHUB: https://github.com/LINBIT

drbd-user mailing list

drbd-user@lists.linbit.com<mailto:drbd-user@lists.linbit.com>

https://lists.linbit.com/mailman/listinfo/drbd-user
--
Sent from Gmail Mobile
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Remove DRBD w/out Data Loss

2020-08-31 Thread Eric Robinson
I may have missed this answer when I checked the manual, but we need to convert 
a DRBD cluster node into standalone server without losing the data. Is that 
possible?

--Eric



Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD sync stalled at 100% ?

2020-07-01 Thread Eric Robinson
Sorry for cross-posting this, but I'm not sure which list is the right one.

I'm not seeing anything on Google about this. Two DRBD nodes lost communication 
with each other, and then reconnected and started sync. But then it got to 100% 
and is just stalled there.

The nodes are 001db03a, 001db03b.

On 001db03a:

[root@001db03a ~]# drbdadm status
ha01_mysql role:Primary
  disk:UpToDate
  001db03b role:Secondary
replication:SyncSource peer-disk:Inconsistent done:100.00

ha02_mysql role:Secondary
  disk:UpToDate
  001db03b role:Primary
peer-disk:UpToDate

On 001drbd03b:

[root@001db03b ~]# drbdadm status
ha01_mysql role:Secondary
  disk:Inconsistent
  001db03a role:Primary
replication:SyncTarget peer-disk:UpToDate done:100.00

ha02_mysql role:Primary
  disk:UpToDate
  001db03a role:Secondary
peer-disk:UpToDate


On 001db03a, here are the DRBD messages from the onset of the problem until now.

Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: PingAck did not 
arrive in time.
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: conn( Connected -> 
NetworkFailure ) peer( Primary -> Unknown )
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql/0 drbd1: disk( UpToDate -> 
Consistent )
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: pdsk( 
UpToDate -> DUnknown ) repl( Established -> Off )
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: ack_receiver 
terminated
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: Terminating ack_recv 
thread
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql: Preparing cluster-wide state 
change 2946943372 (1->-1 0/0)
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql: Committing cluster-wide state 
change 2946943372 (6ms)
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql/0 drbd1: disk( Consistent -> 
UpToDate )
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: Connection closed
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: conn( NetworkFailure 
-> Unconnected )
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: Restarting receiver 
thread
Jun 26 22:34:27 001db03a kernel: drbd ha02_mysql 001db03b: conn( Unconnected -> 
Connecting )
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: PingAck did not 
arrive in time.
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: conn( Connected -> 
NetworkFailure ) peer( Secondary -> Unknown )
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql/0 drbd0 001db03b: pdsk( 
UpToDate -> DUnknown ) repl( Established -> Off )
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: ack_receiver 
terminated
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: Terminating ack_recv 
thread
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql/0 drbd0: new current UUID: 
D07A3D4B2F99832D weak: FFFD
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: Connection closed
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: conn( NetworkFailure 
-> Unconnected )
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: Restarting receiver 
thread
Jun 26 22:34:30 001db03a kernel: drbd ha01_mysql 001db03b: conn( Unconnected -> 
Connecting )
Jun 26 22:34:33 001db03a pengine[1474]:  notice:  * Start  p_drbd0:1
( 001db03b )
Jun 26 22:34:33 001db03a crmd[1475]:  notice: Initiating notify operation 
p_drbd0_pre_notify_start_0 locally on 001db03a
Jun 26 22:34:33 001db03a crmd[1475]:  notice: Result of notify operation for 
p_drbd0 on 001db03a: 0 (ok)
Jun 26 22:34:33 001db03a crmd[1475]:  notice: Initiating start operation 
p_drbd0_start_0 on 001db03b
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Handshake to peer 0 
successful: Agreed network protocol version 113
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Feature flags 
enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Starting ack_recv 
thread (from drbd_r_ha02_mys [2116])
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Preparing remote 
state change 3920461435
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: Committing remote 
state change 3920461435 (primary_nodes=1)
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql 001db03b: conn( Connecting -> 
Connected ) peer( Unknown -> Primary )
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1: disk( UpToDate -> 
Outdated )
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: 
drbd_sync_handshake:
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: self 
492F8D33A72A8E08::659DC04F5C85B6E4:8254EEA2EC50AD7C bits:0 
flags:120
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: peer 
5A6B1EBE80500C39:492F8D33A72A8E09:659DC04F5C85B6E4:51A00A23ED88187A bits:1 
flags:120
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: 
uuid_compare()=-2 by rule 50
Jun 26 22:34:34 001db03a kernel: drbd ha02_mysql/0 drbd1 001db03b: 

Re: [DRBD-user] Checking Runtime Configuration

2020-04-13 Thread Eric Robinson
Precisely it! Thanks much. Believe it or not, I did look around first. 

[cid:image001.png@01D61072.8EC4B2E0]

From: Gianni Milo 
Sent: Sunday, April 12, 2020 12:44 AM
To: Eric Robinson 
Cc: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Checking Runtime Configuration

I think you are looking for 'drbdsetup show  --show-defaults' .

Perhaps not 100% what you asked for, but close enough.

G.

On Sat, 11 Apr 2020 at 17:45, Eric Robinson 
mailto:eric.robin...@psmnv.com>> wrote:
If I want to know what the current run-time value of timeout, ping-int, or 
connect-int are, how to I check that? That’s assuming they may not be the same 
as what shows in the config file.

--Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com<mailto:drbd-user@lists.linbit.com>
https://lists.linbit.com/mailman/listinfo/drbd-user
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Checking Runtime Configuration

2020-04-11 Thread Eric Robinson
If I want to know what the current run-time value of timeout, ping-int, or 
connect-int are, how to I check that? That's assuming they may not be the same 
as what shows in the config file.

--Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] LVM Global Filter Question

2020-03-30 Thread Eric Robinson
Greetings,

I come here about once a year looking for help understanding LVM filtering with 
DRBD. I've read the relevant sections of the DRBD User Guide, visited various 
web sites, and spoken to people in the list, but I'm still looking for a simple 
rule of thumb.

Basically, here is what I've come to understand:


  1.  If DRBD lives *above* an LVM volume, then the default LVM filtering 
settings are fine; no changes are required to lvm.conf. The LVs will remain 
active on all DRBD nodes, and an LVM resource agent is not required.



  1.  If DRBD lives *below* or *between* LVM volumes, then:



 *   set global_filter to reject the DRBD backing devices
 *   set write_cache_state = 0
 *   set use_lvmetad = 0
 *   set volume_list to include block devices required to boot
 *   remove /etc/lvm/cache/.cache.
 *   run lvscan
 *   regenerate initrd
 *   reboot
 *   Use a cluster resource agent to activate/de-activate LVs as required 
by cluster operation

Please feel free to correct any mistakes.

--Eric




Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] LVM Above and Below DRBD?

2020-01-11 Thread Eric Robinson
What should the LVM global filter look like if we have LVs below and above DRDB?

In drbd.conf, the backing device is as follows...

disk   /dev/vg00/lv_under_drbd0;

vg00 is on sda3

But we also have an LV built on top of the drbd disk, as follows...

# lvdisplay

  --- Logical volume ---
  LV Path/dev/vg_on_drbd0/lv_on_drbd0
  LV Namelv_on_drbd0
  VG Namevg_on_drbd0

So how would I write my accept and reject regex's in lvm.conf?

--Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Stupid LVM filter question

2019-10-28 Thread Eric Robinson
If I have an LV as a backing device for a DRBD disk, can someone explain why I 
need an LVM filter at all? It seems to me that we would want the LV to be 
always active on both the primary and secondary DRBD resources, and there 
should be no need or desire to have the LV activated or deactivated in ad-hoc 
fashion by Pacemaker. What am I missing?

--Eric

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] LVM global_filter question

2019-10-27 Thread Eric Robinson
I have a server with the following storage configuration, and I need to know 
how to write my LVM global_filter.



Here’s the storage stack for each block device. “vg” means volume group, “lv” 
means logical volume.



sda1, sdb1 -> md0 [raid1] -> /boot [ext4]

sda2, sdb2 -> md1 [raid1] -> /home [ext4]

sda3, sdb3 -> md2 [raid1] -> vg_root/lv_root -> / [ext4]

sda4, sdb4 -> md3 [raid1] -> vg_future/lv_future (currently unused)

nvme0n1p1, nvme1n1p1 -> md4 [raid1] -> vg_under_drbd0/lv_under_drbd0 -> drbd0 
-> /data1 [ext4]

nvme2n1p1, nvme3n1p1 -> md5 [raid1] -> vg_under_drbd1/lv_under_drbd1 -> drbd1 
-> /data2 [ext4]



If I understand correctly, it should be…



global_filter = [ "a|^/dev/md.*$|", "r/.*/" ]



Is that correct?



--Eric







Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Is it Okay to Ask for Paid Consulting Assistance?

2018-09-21 Thread Eric Robinson
Hey all, I'm just checking with the list about this. We've been using drbd for 
more than a decade, but there are lots of things we don't know. We don't need 
help very often. When we do, we just need a little bit, so a Linbit support 
contract does not make financial sense for us. Is it okay to send a message to 
the list offering to hire someone to provide knowledgeable drbd support? We 
used to call hastexo for that kind of thing, but I guess they don't do it 
anymore.

--Eric

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Resync Incredibly Slow

2018-08-20 Thread Eric Robinson
> On Sun, Aug 19, 2018 at 1:39 AM, Eric Robinson 
> wrote:
> >
> > I’ve been using drbd for a decade, but this is my first experience
> > with drbd 9.0.14. The resync is incredibly slow. Normally, resync
> > takes a few seconds to a few minutes, but this is taking hours,
> > advancing just a few hundredths of a percent every few seconds…
> >
> 
> Hi Eric,
> 
> I am wondering, are you running drbd in physical hosts, or vitual machines?
> 
> I am in a worse situation with the same version of drbd, installed in two up 
> to
> date Centos 7 virtual machines running in an OpenStack pike environment, with
> the same version of drbd installed from the elrepo
> repository:

We have Centos 7.5 servers running in Microsoft Azure. 

> 
> [root@drbd1 ~]# rpm -qa | grep kmod-drbd
> kmod-drbd90-9.0.14-1.el7_5.elrepo.x86_64
> 
> I configured this environment last week Thursday, and believe it or not, as of
> today sync is at 0.1%:
> 
> [root@drbd1 ~]# cat
> /sys/kernel/debug/drbd/resources/drbd0/connections/drbd2.novalocal/0/proc_
> drbd
>  0: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r-
> ns:6363972 nr:0 dw:0 dr:112432036 al:0 bm:0 lo:0 pe:[0;4] ua:52 ap:[0;0]
> ep:1 wo:1 oos:5242556
> [>] sync'ed:  0.1% (5116/5116)M
> finish: 127:25:23 speed: 0 (0 -- 0) K/sec
>   0% sector pos: 0/10485368
> resync: used:1/61 hits:62 misses:2 starving:0 locked:0 changed:1
> act_log: used:0/1237 hits:0 misses:0 starving:0 locked:0 changed:0
> blocked on activity log: 0
> 
> Whatever I did, did not help, so finally I replicated the environment in my
> laptop, in a VMware Workstation 11 environment, using the same version of
> drbd, and in this case I have no issues.
> 

Hmm. I have absolutely no idea what to say about that. If I did, my own issue 
would be fixed. 

> This is my first experience with drbd, so I do not know how to troubleshoot.
> Now I am thinking to look into older 8.x version, that might be working 
> better.
> 
> Regards,
> Adrian
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Resync Incredibly Slow

2018-08-18 Thread Eric Robinson
I've been using drbd for a decade, but this is my first experience with drbd 
9.0.14. The resync is incredibly slow. Normally, resync takes a few seconds to 
a few minutes, but this is taking hours, advancing just a few hundredths of a 
percent every few seconds...


ha01_mysql role:Secondary
  disk:Inconsistent
  001db01b role:Primary
replication:SyncTarget peer-disk:UpToDate done:93.84

ha02_mysql role:Primary
  disk:UpToDate
  001db01b role:Secondary
replication:SyncSource peer-disk:Inconsistent done:84.49


Also, the numbers shown by drbdadm status are way different than what is shown 
under /sys...


[root@001db01b 0]# cat 
/sys/kernel/debug/drbd/resources/ha02_mysql/connections/001db01a/0/proc_drbd
1: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-
ns:8 nr:175023740 dw:657162492 dr:615264242 al:2325 bm:0 lo:0 pe:[0;0] ua:0 
ap:[0;0] ep:1 wo:2 oos:307063536
[==>.] sync'ed: 36.4% (299864/470788)M
finish: 1:00:41 speed: 84,300 (91,452 -- 80,028) want: 614,400 K/sec
 11% sector pos: 373912576/3221127096
resync: used:0/61 hits:696233 misses:2854 starving:0 locked:0 
changed:1427
act_log: used:0/1237 hits:842881 misses:5965929 starving:0 locked:0 
changed:118214
blocked on activity log: 0
[root@001db01b 0]# cat 
/sys/kernel/debug/drbd/resources/ha01_mysql/connections/001db01a/0/proc_drbd
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-
ns:370199208 nr:580 dw:1039815524 dr:197034909 al:50060 bm:0 lo:0 pe:[0;0] 
ua:0 ap:[0;0] ep:1 wo:2 oos:105671448
[=>..] sync'ed: 10.0% (103192/114580)M
finish: 3:50:34 speed: 7,632 (2,804 -- 4,988) K/sec
  0% sector pos: 0/3221127096
resync: used:0/61 hits:188990 misses:364 starving:0 locked:0 changed:186
act_log: used:0/1237 hits:173070247 misses:3163320 starving:0 locked:0 
changed:130073
blocked on activity log: 0


[sig]

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd+lvm no bueno

2018-07-30 Thread Eric Robinson
> > > > Lars,
> > > >
> > > > I put MySQL databases on the drbd volume. To back them up, I pause
> > > > them and do LVM snapshots (then rsync the snapshots to an archive
> > > > server). How could I do that with LVM below drbd, since what I
> > > > want is a snapshot of the filesystem where MySQL lives?
> > >
> > > You just snapshot below DRBD, after "quiescen" the mysql db.
> > >
> > > DRBD is transparent, the "garbage" (to the filesystem) of the
> > > "trailing drbd meta data" is of no concern.
> > > You may have to "mount -t ext4" (or xfs or whatever), if your mount
> > > and libblkid decide that this was a "drbd" type and could not be
> > > mounted. They are just trying to help, really.
> > > which is good. but in that case they get it wrong.
> >
> > Okay, just so I understand
> >
> > Suppose I turn md4 into a PV and create one volume group
> > 'vg_under_drbd0', and logical volume 'lv_under_drbd0' that takes 95%
> > of the space, leaving 5% for snapshots.
> >
> > Then I create my ext4 filesystem directly on drbd0.
> >
> > At backup time, I quiesce the MySQL instances and create a snapshot of
> > the drbd disk.
> >
> > I can then mount the drbd snapshot as a filesystem?
> 
> Yes.
> Though obviously, those snapshots won't "failover", in case you have a node
> failure and failover during the backup.
> Snapshots in a VG "on top of" DRBD do failover.

Your advice (and Veit's) was spot on. I rebuilt everything with LVM under drbd 
instead of over it, added the appropriate filter in lvm.conf, and rebuilt my 
initramfs, and everything is working great. Failover works as expected without 
volume activation problems, and I could snapshot the drbd volume and mount it 
as a filesystem. You all hit it out of the park. Thanks!


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd+lvm no bueno

2018-07-28 Thread Eric Robinson





> -Original Message-
> From: Eric Robinson
> Sent: Saturday, July 28, 2018 7:39 AM
> To: Lars Ellenberg ; drbd-user@lists.linbit.com
> Subject: RE: [DRBD-user] drbd+lvm no bueno
> 
> > > > Lars,
> > > >
> > > > I put MySQL databases on the drbd volume. To back them up, I pause
> > > > them and do LVM snapshots (then rsync the snapshots to an archive
> > > > server). How could I do that with LVM below drbd, since what I
> > > > want is a snapshot of the filesystem where MySQL lives?
> > >
> > > You just snapshot below DRBD, after "quiescen" the mysql db.
> > >
> > > DRBD is transparent, the "garbage" (to the filesystem) of the
> > > "trailing drbd meta data" is of no concern.
> > > You may have to "mount -t ext4" (or xfs or whatever), if your mount
> > > and libblkid decide that this was a "drbd" type and could not be
> > > mounted. They are just trying to help, really.
> > > which is good. but in that case they get it wrong.
> >
> > Okay, just so I understand
> >
> > Suppose I turn md4 into a PV and create one volume group
> > 'vg_under_drbd0', and logical volume 'lv_under_drbd0' that takes 95%
> > of the space, leaving 5% for snapshots.
> >
> > Then I create my ext4 filesystem directly on drbd0.
> >
> > At backup time, I quiesce the MySQL instances and create a snapshot of
> > the drbd disk.
> >
> > I can then mount the drbd snapshot as a filesystem?
> >
> 
> Disregard question. I tested it. Works fine. Mind blown.
> 
> -Eric
> 

Although I discovered quite by accident that you can mount a snapshot over the 
top of the filesystem that exists on the device that it's a snapshot of. 
Wouldn't this create some sort of recursive write death spiral?

Check it out...

root@001db01a /]# lvdisplay
  --- Logical volume ---
  LV Path/dev/vg_under_drbd1/lv_under_drbd1
  LV Namelv_under_drbd1
  VG Namevg_under_drbd1
  LV UUIDLWWPiL-Y6nR-cNnW-j2E9-LAK9-UsXm-3inTyJ
  LV Write Accessread/write
  LV Creation host, time 001db01a, 2018-07-28 04:53:14 +
  LV Status  available
  # open 2
  LV Size1.40 TiB
  Current LE 367002
  Segments   1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 8192
  Block device   253:1

  --- Logical volume ---
  LV Path/dev/vg_under_drbd0/lv_under_drbd0
  LV Namelv_under_drbd0
  VG Namevg_under_drbd0
  LV UUIDM2oMNd-hots-d9Pf-KQG8-YPqh-6x3a-r6wBqo
  LV Write Accessread/write
  LV Creation host, time 001db01a, 2018-07-28 04:52:59 +
  LV Status  available
  # open 2
  LV Size1.40 TiB
  Current LE 367002
  Segments   1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 8192
  Block device   253:0

[root@001db01a /]# df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda230G  3.3G   27G  12% /
devtmpfs 63G 0   63G   0% /dev
tmpfs63G 0   63G   0% /dev/shm
tmpfs63G  9.0M   63G   1% /run
tmpfs63G 0   63G   0% /sys/fs/cgroup
/dev/sda1   497M   78M  420M  16% /boot
/dev/sdb1   252G   61M  239G   1% /mnt/resource
tmpfs13G 0   13G   0% /run/user/0
/dev/drbd0  1.4T  2.1G  1.4T   1% /ha01_mysql
[root@001db01a /]#
[root@001db01a /]# ls /ha01_mysql
lost+found  testfile
[root@001db01a /]#
[root@001db01a /]# lvcreate -s -L30G -n drbd0_snapshot 
/dev/vg_under_drbd0/lv_under_drbd0
  Logical volume "drbd0_snapshot" created.
[root@001db01a /]#
[root@001db01a /]# mount /dev/vg_under_drbd0/drbd0_snapshot /ha01_mysql
[root@001db01a /]#
[root@001db01a /]# cd /ha01_mysql
[root@001db01a ha01_mysql]# ls
lost+found  testfile
[root@001db01a ha01_mysql]# echo blah > blah.txt
[root@001db01a ha01_mysql]# ll
total 2097172
-rw-r--r--. 1 root root  5 Jul 28 14:50 blah.txt
drwx--. 2 root root  16384 Jul 28 14:10 lost+found
-rw-r--r--. 1 root root 2147479552 Jul 28 14:20 testfile
[root@001db01a ha01_mysql]# cd /
 [root@001db01a /]# umount /ha01_mysql
[root@001db01a /]# ls /ha01_mysql
lost+found  testfile
[root@001db01a /]#

What? I know nothing.

--Eric
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd+lvm no bueno

2018-07-28 Thread Eric Robinson
> > > Lars,
> > >
> > > I put MySQL databases on the drbd volume. To back them up, I pause
> > > them and do LVM snapshots (then rsync the snapshots to an archive
> > > server). How could I do that with LVM below drbd, since what I want
> > > is a snapshot of the filesystem where MySQL lives?
> >
> > You just snapshot below DRBD, after "quiescen" the mysql db.
> >
> > DRBD is transparent, the "garbage" (to the filesystem) of the
> > "trailing drbd meta data" is of no concern.
> > You may have to "mount -t ext4" (or xfs or whatever), if your mount
> > and libblkid decide that this was a "drbd" type and could not be
> > mounted. They are just trying to help, really.
> > which is good. but in that case they get it wrong.
> 
> Okay, just so I understand
> 
> Suppose I turn md4 into a PV and create one volume group 'vg_under_drbd0',
> and logical volume 'lv_under_drbd0' that takes 95% of the space, leaving 5%
> for snapshots.
> 
> Then I create my ext4 filesystem directly on drbd0.
> 
> At backup time, I quiesce the MySQL instances and create a snapshot of the
> drbd disk.
> 
> I can then mount the drbd snapshot as a filesystem?
> 

Disregard question. I tested it. Works fine. Mind blown.

-Eric


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd+lvm no bueno

2018-07-27 Thread Eric Robinson
> > Lars,
> >
> > I put MySQL databases on the drbd volume. To back them up, I pause
> > them and do LVM snapshots (then rsync the snapshots to an archive
> > server). How could I do that with LVM below drbd, since what I want is
> > a snapshot of the filesystem where MySQL lives?
> 
> You just snapshot below DRBD, after "quiescen" the mysql db.
> 
> DRBD is transparent, the "garbage" (to the filesystem) of the "trailing drbd
> meta data" is of no concern.
> You may have to "mount -t ext4" (or xfs or whatever), if your mount and
> libblkid decide that this was a "drbd" type and could not be mounted. They are
> just trying to help, really.
> which is good. but in that case they get it wrong.

Okay, just so I understand

Suppose I turn md4 into a PV and create one volume group 'vg_under_drbd0', and 
logical volume 'lv_under_drbd0' that takes 95% of the space, leaving 5% for 
snapshots.

Then I create my ext4 filesystem directly on drbd0.

At backup time, I quiesce the MySQL instances and create a snapshot of the drbd 
disk.

I can then mount the drbd snapshot as a filesystem?   
 
> 
> > How severely does putting LVM on top of drbd affect performance?
> 
> It's not the "putting LVM on top of drbd" part.
> it's what most people think when doing that:
> use a huge single DRBD as PV, and put loads of unrelated LVS inside of that.
> 
> Which then all share the single DRBD "activity log" of the single DRBD volume,
> which then becomes a bottleneck for IOPS.
> 

I currently have one big drbd disk with one volume group over it and one 
logical volume that takes up 95% of the space, leaving 5% of the volume group 
for snapshots. I run multiple instances of MySQL out of different directories. 
I don't see a way to avoid the activity log bottleneck problem.


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd+lvm no bueno

2018-07-26 Thread Eric Robinson


> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Robert Altnoeder
> Sent: Thursday, July 26, 2018 5:12 AM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] drbd+lvm no bueno
> 
> On 07/26/2018 08:50 AM, Eric Robinson wrote:
> >
> >
> > Failed Actions:
> >
> > * p_lv_on_drbd1_start_0 on ha16b 'not running' (7): call=68,
> > status=complete, exitreason='LVM: vg_on_drbd1 did not activate
> > correctly',
> >
> >     last-rc-change='Wed Jul 25 22:36:37 2018', queued=0ms, exec=401ms
> >
> >
> >
> > The storage stack is:
> >
> >
> >
> > md4 -> drbd -> lvm -> filesystem
> >
> 
> This is most probably an LVM configuration error. Any LVM volume group on
> top of DRBD must be deactivated/stopped whenever DRBD is Secondary and
> must be started whenever DRBD is Primary, and LVM must be prevented from
> finding and using the storage device that DRBD uses as its backend, which it
> would normally do, because it can see the LVM physical volume signature not
> only on the DRBD device, but also on the backing device that DRBD uses.
> 

Would there really be a PV signature on the backing device? I didn't turn md4 
into a PV (did not run pvcreate /dev/md4), but I did turn the drbd disk into 
one (pvcreate /dev/drbd1). 

-Eric 
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd+lvm no bueno

2018-07-26 Thread Eric Robinson
Thank you, I will check that out.

From: Jaco van Niekerk [mailto:j...@desktop.co.za]
Sent: Thursday, July 26, 2018 3:34 AM
To: Eric Robinson ; drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] drbd+lvm no bueno


Hi

Check your LVM configuration:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/configuring_the_red_hat_high_availability_add-on_with_pacemaker/s1-exclusiveactive-haaa

Regards

Jaco van Niekerk

Office:   011 608 2663  E-mail:  j...@desktop.co.za<mailto:j...@desktop.co.za>
[Desktop]
accepts no liability for the content of this email, or for the consequences of 
any actions taken on the basis of the information provided, unless that 
information is subsequently confirmed in writing. If you are not the intended 
recipient, you are notified that disclosing, copying, distributing or taking 
any action in reliance on the contents of this information is strictly 
prohibited.

Disclaimer added by CodeTwo Exchange Rules 2010
www.codetwo.com<http://www.codetwo.com/?sts=1048>
On 26/07/2018 11:35, Eric Robinson wrote:
Using drbd 9.0.14, I am having trouble getting rtesources to move between 
nodes. I get...

Failed Actions:
* p_lv_on_drbd1_start_0 on ha16b 'not running' (7): call=68, status=complete, 
exitreason='LVM: vg_on_drbd1 did not activate correctly',
last-rc-change='Wed Jul 25 22:36:37 2018', queued=0ms, exec=401ms

The storage stack is:

md4 -> drbd -> lvm -> filesystem

--Eric

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] drbd+lvm no bueno

2018-07-26 Thread Eric Robinson
Using drbd 9.0.14, I am having trouble getting rtesources to move between 
nodes. I get...

Failed Actions:
* p_lv_on_drbd1_start_0 on ha16b 'not running' (7): call=68, status=complete, 
exitreason='LVM: vg_on_drbd1 did not activate correctly',
last-rc-change='Wed Jul 25 22:36:37 2018', queued=0ms, exec=401ms

The storage stack is:

md4 -> drbd -> lvm -> filesystem

--Eric
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Error Installing Module

2018-07-11 Thread Eric Robinson
> You need to reboot into the new kernel you just installed.

D'oh.

Thanks. Worked like a charm. 

--Eric
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Error Installing Module

2018-07-10 Thread Eric Robinson
Just did a fresh install of CentOS 7.4.1708.

DRBD installation seemed to go fine, but when I try to insert the module, I get 
error:

[root@ha16a etc]# insmod 
/usr/lib/modules/3.10.0-862.6.3.el7.x86_64/weak-updates/drbd90/drbd.ko
insmod: ERROR: could not insert module 
/usr/lib/modules/3.10.0-862.6.3.el7.x86_64/weak-updates/drbd90/drbd.ko: Unknown 
symbol in module

[root@ha16a etc]# insmod 
/usr/lib/modules/3.10.0-862.el7.x86_64/extra/drbd90/drbd.ko
insmod: ERROR: could not insert module 
/usr/lib/modules/3.10.0-862.el7.x86_64/extra/drbd90/drbd.ko: Unknown symbol in 
module


Here is the full install process, if it helps.

[root@ha16a etc]# yum install drbd90-utils kmod-drbd90
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: mirrors.xmission.com
* elrepo: repos.lax-noc.com
* extras: repos.lax.quadranet.com
* updates: mirrors.sonic.net
Resolving Dependencies
--> Running transaction check
---> Package drbd90-utils.x86_64 0:9.3.1-1.el7.elrepo will be installed
---> Package kmod-drbd90.x86_64 0:9.0.14-1.el7_5.elrepo will be installed
--> Processing Dependency: kernel(sme_me_mask) = 0x17fbce60 for package: 
kmod-drbd90-9.0.14-1.el7_5.elrepo.x86_64
--> Processing Dependency: kernel(genl_register_family) = 0x57dc0635 for 
package: kmod-drbd90-9.0.14-1.el7_5.elrepo.x86_64
--> Processing Dependency: kernel(__x86_indirect_thunk_rdx) = 0xb601be4c for 
package: kmod-drbd90-9.0.14-1.el7_5.elrepo.x86_64
--> Processing Dependency: kernel(__x86_indirect_thunk_rcx) = 0xc29957c3 for 
package: kmod-drbd90-9.0.14-1.el7_5.elrepo.x86_64
--> Processing Dependency: kernel(__x86_indirect_thunk_rbx) = 0x593c1bac for 
package: kmod-drbd90-9.0.14-1.el7_5.elrepo.x86_64
--> Processing Dependency: kernel(__x86_indirect_thunk_rax) = 0x2ea2c95c for 
package: kmod-drbd90-9.0.14-1.el7_5.elrepo.x86_64
--> Processing Dependency: kernel(__x86_indirect_thunk_r15) = 0x0f05c7b8 for 
package: kmod-drbd90-9.0.14-1.el7_5.elrepo.x86_64
--> Processing Dependency: kernel(__x86_indirect_thunk_r14) = 0xce8b1878 for 
package: kmod-drbd90-9.0.14-1.el7_5.elrepo.x86_64
--> Processing Dependency: kernel(__x86_indirect_thunk_r13) = 0xe7b00dfb for 
package: kmod-drbd90-9.0.14-1.el7_5.elrepo.x86_64
--> Processing Dependency: kernel(__x86_indirect_thunk_r12) = 0x263ed23b for 
package: kmod-drbd90-9.0.14-1.el7_5.elrepo.x86_64
--> Processing Dependency: kernel >= 3.10.0-862.el7 for package: 
kmod-drbd90-9.0.14-1.el7_5.elrepo.x86_64
--> Running transaction check
---> Package kernel.x86_64 0:3.10.0-862.6.3.el7 will be installed
--> Processing Dependency: linux-firmware >= 20180113-61 for package: 
kernel-3.10.0-862.6.3.el7.x86_64
--> Running transaction check
---> Package linux-firmware.noarch 0:20170606-56.gitc990aae.el7 will be updated
---> Package linux-firmware.noarch 0:20180220-62.2.git6d51311.el7_5 will be an 
update
--> Finished Dependency Resolution

Dependencies Resolved

===
PackageArch 
  Version   
Repository   Size
===
Installing:
drbd90-utils   x86_64   
  9.3.1-1.el7.elrepo
elrepo  680 k
kernel x86_64   
  3.10.0-862.6.3.el7
updates  46 M
kmod-drbd90x86_64   
  9.0.14-1.el7_5.elrepo 
elrepo  266 k
Updating for dependencies:
linux-firmware noarch   
  20180220-62.2.git6d51311.el7_5
updates  57 M

Transaction Summary
===
Install  3 Packages
Upgrade ( 1 Dependent package)

Total size: 104 M
Total download size: 947 k
Is this ok [y/d/N]: y
Downloading packages:
(1/2): kmod-drbd90-9.0.14-1.el7_5.elrepo.x86_64.rpm 
   

Re: [DRBD-user] Updating Kernel w/out Updating DRBD

2018-06-23 Thread Eric Robinson
> On Fri, Jun 22, 2018 at 02:21:10PM +0000, Eric Robinson wrote:
> > > > Also, I find it odd that the option to build from source is only
> > > > in the DRBD 8.3 User Guide and was left out of the 8.4 and 9.X
> > > > User Guides. (I'm sure the reason is obvious to everyone else I
> > > > just missed
> > > > something.)
> > >
> > > As I maintain parts of the build system and the documentation
> > > framework I can answer that (happened at least once here on the ML):
> > >
> > > - If information is in the UG and is wrong, it is a bug.
> > > - Documenting all possible build flavors for all distributions we
> > >   support is too much maintenance work. Looking at the build system and
> > >   the various hacks we need for outdated distributions (especially the
> > >   rpm ones with different and broken macros), that would fill pages in
> > >   the UG to *really* get it right. And I'm reluctant to accept patches
> > >   for "the general case", it always outdates and needs maintenance for
> > >   basically no gain. If you need (special kinds of) packages, you are a)
> > >   clever enough to figure it out on your own, or b) you let somebody
> > >   else figure that out for you. Thanks to Veith for his great summary,
> > >   we as LINBIT obviously also provide all kinds of packages for various
> > >   kinds of distributions for our customers.
> > >
> > > Regards, rck
> >
> > Thanks for the clarification. May I suggest that a brief comment such
> > as you provided above would be an excellent addition to the UG? Since
> > the build instructions have always worked for me for the past twelve
> > years, it threw me for a loop when they silently disappeared from the
> > 8.4 and 9.X user guides. I was constantly wondering if I was doing
> > something wrong by using the old instructions as a guideline.
> >
> 
> Fair enough, I put it on the TODO list for the UGs.
> 
> Regards, rck

Thank you very much!

--Eric


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Updating Kernel w/out Updating DRBD

2018-06-22 Thread Eric Robinson
> > Also, I find it odd that the option to build from source is only in
> > the DRBD 8.3 User Guide and was left out of the 8.4 and 9.X User
> > Guides. (I'm sure the reason is obvious to everyone else I just missed
> > something.)
> 
> As I maintain parts of the build system and the documentation framework I can
> answer that (happened at least once here on the ML):
> 
> - If information is in the UG and is wrong, it is a bug.
> - Documenting all possible build flavors for all distributions we
>   support is too much maintenance work. Looking at the build system and
>   the various hacks we need for outdated distributions (especially the
>   rpm ones with different and broken macros), that would fill pages in
>   the UG to *really* get it right. And I'm reluctant to accept patches
>   for "the general case", it always outdates and needs maintenance for
>   basically no gain. If you need (special kinds of) packages, you are a)
>   clever enough to figure it out on your own, or b) you let somebody
>   else figure that out for you. Thanks to Veith for his great summary,
>   we as LINBIT obviously also provide all kinds of packages for various
>   kinds of distributions for our customers.
> 
> Regards, rck

Thanks for the clarification. May I suggest that a brief comment such as you 
provided above would be an excellent addition to the UG? Since the build 
instructions have always worked for me for the past twelve years, it threw me 
for a loop when they silently disappeared from the 8.4 and 9.X user guides. I 
was constantly wondering if I was doing something wrong by using the old 
instructions as a guideline. 

--Eric
 

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Updating Kernel w/out Updating DRBD

2018-06-22 Thread Eric Robinson
Wow, Veit, thanks so much for taking time to provide those details!


> -Original Message-
> From: Veit Wahlich [mailto:cru.li...@zodia.de]
> Sent: Friday, June 22, 2018 3:54 AM
> To: Eric Robinson 
> Cc: drbd-user@lists.linbit.com
> Subject: RE: [DRBD-user] Updating Kernel w/out Updating DRBD
> 
> Well, I assume you are on el7 here. Adapt to other distros if required.
> 
> 1. Install dkms, for el7 it is available in EPEL:
> 
> # yum install dkms
> 
> 2. Untar the drbd tarball in /usr/src/, for drbd 8.4.11-1, you should now 
> have a
> directory /usr/src/drbd-8.4.11-1/.
> 
> 3. Create a file /usr/src/drbd-8.4.11-1/dkms.conf with this content:
> 
> PACKAGE_NAME="drbd"
> PACKAGE_VERSION="8.4.11-1"
> MAKE="make -C drbd KDIR=/lib/modules/${kernelver}/build"
> BUILT_MODULE_NAME[0]=drbd
> DEST_MODULE_LOCATION[0]=/kernel/drivers/block
> BUILT_MODULE_LOCATION[0]=drbd
> CLEAN="make -C drbd clean"
> AUTOINSTALL=yes
> 
> 4. Register drbd with dkms, so dkms knows about it:
> 
> # dkms add -m drbd -v 8.4.11-1
> 
> 5. Build the module of the desired version for the current kernel:
> 
> # dkms build -m drbd -v 8.4.11-1
> 
> 6. Install the module of the desired version to the kernel's module
> tree:
> 
> # dkms install -m drbd -v 8.4.11-1
> 
> You should now be able to use drbd.
> 
> dkms installs a hook that will automatically rebuild the module once you 
> install
> a new kernel{,-devel} package.
> On rpm-based distros (maybe also others, I have not tested) and depending on
> configuration, dkms also builds rpms for the new kmods, so all files dkms 
> writes
> are being registered with the package management.
> 
> If you want to remove a dkms installed module, you may simply use:
> 
> # dkms remove -m drbd -v 8.4.11-1 --all
> 
> --all removes the module from all kernel module trees.
> 
> Starting with drbd 9.0, the source tarball also includes an almost ready to 
> use
> dkms.conf file in the debian/ subdir. It is not specific to Debian. May may 
> want
> to copy it to .. and edit the module version number.
> Please note that drbd 9.0 has 2 kernel module files (dkms.ko and
> drbd_transport_tcp.ko) and the module source changed to src/drbd/, so with
> drbd 9.0 use the the dkms.conf file provided with the tarball instead of my
> example dkms.conf above.
> 
> Best regards,
> // Veit
> 
> 
> Am Freitag, den 22.06.2018, 08:43 + schrieb Eric Robinson:
> > I'm familiar with the --with-km switch when building drbd, but I don't see
> anything in the documentation that allows building an akmod or dkms version
> instead. How would I do that?
> >
> > Also, I find it odd that the option to build from source is only in
> > the DRBD 8.3 User Guide and was left out of the 8.4 and 9.X User
> > Guides. (I'm sure the reason is obvious to everyone else I just missed
> > something.)
> >
> > --Eric
> 
> 

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Updating Kernel w/out Updating DRBD

2018-06-22 Thread Eric Robinson
> From: Veit Wahlich [mailto:cru.li...@zodia.de]
> Sent: Friday, June 22, 2018 12:45 AM
> To: Eric Robinson 
> Cc: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] Updating Kernel w/out Updating DRBD
> 
> Hi Eric,
> 
> if your distro is el (e.g. RHEL/CentOS/Scientific), the kernel ABI
> *should* not change during kernel updates, and copying modules from older
> kernel versions as "weak updates" is not uncommon, following the slogan "old
> module is better than no module". This is for example the case for CentOS 7
> and worked quite well in the past, unfortunately with upgrade to 7.5 the ABI
> changed nevertheless and caused many systems even to crash when using
> some old modules, including drbd.
> 
> If you build the module on the system that runs it, you might consider
> installing/building a dkms or akmod package of drbd instead, along with
> dkms/akmod itself. When booting a new kernel, dkms/akmod will check
> whether the packaged modules already exist for the running kernel, and if not,
> they will be built and installed. This works as long as the module source 
> builds
> well against the kernel source/headers provided and all dependencies and build
> tools are present.
> 
> Regards,
> // Veit
> 
> Am Freitag, den 22.06.2018, 04:38 + schrieb Eric Robinson:
> > Greetings -
> >
> > We always build drbd as a KLM, and it seems that every time we update the
> kernel (with yum update) we have to rebuild drbd. This is probably the 
> worlds's
> dumbest question, but is there a way to update the kernel without having to
> rebuild drbd every time?
> >
> > --Eric
> >


I'm familiar with the --with-km switch when building drbd, but I don't see 
anything in the documentation that allows building an akmod or dkms version 
instead. How would I do that?

Also, I find it odd that the option to build from source is only in the DRBD 
8.3 User Guide and was left out of the 8.4 and 9.X User Guides. (I'm sure the 
reason is obvious to everyone else I just missed something.) 

--Eric


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Updating Kernel w/out Updating DRBD

2018-06-22 Thread Eric Robinson
Greetings -

We always build drbd as a KLM, and it seems that every time we update the 
kernel (with yum update) we have to rebuild drbd. This is probably the worlds's 
dumbest question, but is there a way to update the kernel without having to 
rebuild drbd every time?

--Eric





___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Can TRIM One drbd volume but not the other.

2018-01-25 Thread Eric Robinson
Here is a WEIRD one.

Why would one drbd volume be trimmable and the other one not?

Here you can see me issuing the trim command against two different filesystems. 
It works on one but fails on the other.

ha11a:~ # fstrim -v /ha01_mysql
/ha01_mysql: 0 B (0 bytes) trimmed

ha11a:~ # fstrim -v /ha02_mysql
fstrim: /ha02_mysql: the discard operation is not supported

Both filesystems are on the same server, two different drbd devices on two 
different mdraid arrays, but the same underlying physical drives.

Yet it can be seen that discard is enabled on drbd0 but not on drbd1...

NAMEDISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda0  512B   4G 1
├─sda1 0  512B   4G 1
│ └─md00  128K 256M 0
├─sda2 0  512B   4G 1
│ └─md10  128K 256M 0
├─sda3 0  512B   4G 1
├─sda4 0  512B   4G 1
├─sda5 0  512B   4G 1
│ └─md201M 256M 0
│   └─drbd001M 128M 0
│ └─vg_on_drbd0-lv_on_drbd0   3932161M 128M 0
└─sda6 0  512B   4G 1
  └─md301M 256M 0
└─drbd100B   0B 0
  └─vg_on_drbd1-lv_on_drbd100B   0B 0


The filesystems are set up the same. (Note that I do not want automatic discard 
so that option is not enabled on either filesystem, but the problem is not the 
filesystem, since that relies on drbd, and you can see from lsblk that the drbd 
volume is the problem.)

ha11a:~ # mount|grep drbd
/dev/mapper/vg_on_drbd1-lv_on_drbd1 on /ha02_mysql type ext4 
(rw,relatime,stripe=160,data=ordered)
/dev/mapper/vg_on_drbd0-lv_on_drbd0 on /ha01_mysql type ext4 
(rw,relatime,stripe=160,data=ordered)





___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Cannot connect to the drbdmanaged process using DBus

2017-12-15 Thread Eric Robinson
What does this mean?

ha11a:~ # drbdmanage init 198.51.100.65

You are going to initialize a new drbdmanage cluster.
CAUTION! Note that:
  * Any previous drbdmanage cluster information may be removed
  * Any remaining resources managed by a previous drbdmanage installation
that still exist on this system will no longer be managed by drbdmanage

Confirm:

  yes/no: yes
Empty drbdmanage control volume initialized on '/dev/drbd0'.
Empty drbdmanage control volume initialized on '/dev/drbd1'.

Error: Cannot connect to the drbdmanaged process using DBus
The DBus subsystem returned the following error description:
org.freedesktop.DBus.Error.Spawn.ChildExited: Launch helper exited with unknown 
return code 1

I'm using...

drbd-9.0.9+git.bffac0d9-72.1.x86_64
drbd-kmp-default-9.0.9+git.bffac0d9_k4.4.76_1-72.1.x86_64
drbdmanage-0.99.5-5.1.noarch
drbd-utils-9.0.0-56.1.x86_64



[sig3]

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

2017-10-19 Thread Eric Robinson
> > However, there are no zeroes anywhere else in the file except at the
> > end of every sequence.
> >
> > I guess it is safe to conclude that your theory is right. The file
> > does not appear to really be corrupted. The TrimTester tool is
> > reporting a false positive.
> >
> > --Eric
> >
> I think you mentioned that you had at least one C++ consultant look at the
> source code of the TrimTester tool, and that it had even been reviewed by
> Samsung engineers before, and that is the part that I find most worrying.
> Someone of those people should have seen immediately that the code looks
> evidently suspicious in more than just a few places, and should either have
> investigated or at least should have pointed out that the code may be
> flawed.
> 

I agree completely. It may have boiled down to motivation. Those first two guys 
were working essentially gratis and I think they just gave it a cursory 
once-over. The third guy was a /paid/ consultant. He saw the problem 
immediately, fixed it, and committed the changes to github. You get what you 
pay for. 

-Eric
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

2017-10-18 Thread Eric Robinson
Okay, here’s the latest.

I ran the TrimTester tool on two separate servers in DRBD standalone mode, and 
both wrote 24TB to disk without any file corruption errors. I then connected 
and sync’d the nodes and ran the test again with replication enabled. 
TrimTester detected file corruption after a few hours.

The supposedly “corrupted” file is 4.9GB in size (as expected, over 4GB). The 
file name is ’50.’ It contains a repeating sequence of characters that looks 
like this:

# od -b 50 | more
000 001 002 003 004 005 006 007 010 011 012 013 014 015 016 017 020
020 021 022 023 024 025 026 027 030 031 032 033 034 035 036 037 040
040 041 042 043 044 045 046 047 050 051 052 053 054 055 056 057 060
060 061 062 063 064 065 066 067 070 071 072 073 074 075 076 077 100
100 101 102 103 104 105 106 107 110 111 112 113 114 115 116 117 120
120 121 122 123 124 125 126 127 130 131 132 133 134 135 136 137 140
140 141 142 143 144 145 146 147 150 151 152 153 154 155 156 157 160
160 161 162 163 164 165 166 167 170 171 172 173 174 175 176 177 200
200 201 202 203 204 205 206 207 210 211 212 213 214 215 216 217 220
220 221 222 223 224 225 226 227 230 231 232 233 234 235 236 237 240
240 241 242 243 244 245 246 247 250 251 252 253 254 255 256 257 260
260 261 262 263 264 265 266 267 270 271 272 273 274 275 276 277 300
300 301 302 303 304 305 306 307 310 311 312 313 314 315 316 317 320
320 321 322 323 324 325 326 327 330 331 332 333 334 335 336 337 340
340 341 342 343 344 345 346 347 350 351 352 353 354 355 356 357 360
360 361 362 363 364 365 366 367 370 371 372 373 374 375 376 377 000


The same sequence repeats to the end of the file. As you can see, each 
iteration of the character sequence ends with a zero/null. However, there are 
no zeroes anywhere else in the file except at the end of every sequence.

I guess it is safe to conclude that your theory is right. The file does not 
appear to really be corrupted. The TrimTester tool is reporting a false 
positive.

--Eric


From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Lars Ellenberg
Sent: Tuesday, October 17, 2017 9:24 AM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 
and 9.0

> Most importantly: once the trimtester (or *any* "corruption detecting"
> tool) claims that a certain corruption is found, you look at what supposedly 
> is
> corrupt, and double check if it in fact is.
>
> Before doing anything else.
>
I did that, but I don't know what a "good" file is supposed to look like, so I 
can't tell whether it is really corrupted.
The last time TrimTester reported a corrupt file, I checked it manually and it 
looked fine to me, but I don't know what I'm looking for.

For it to be reported as corrupted by trimester, it would need to contain at 
least one aligned 512 byte sector full of zeroes. Did it? Did it contain even 
two zero bytes next to each other?

Or simply the same 256 byte cyclic pattern that is written by this tool?

If so, then obviously the tool in error claimed to find a corruption that 
clearly was not there.

Feel free to keep using that tool, but fix it first, change the unsigned i and 
j to size_t or uint64_t. And maybe have it really check all files, before 
removing them.

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

2017-10-17 Thread Eric Robinson
> On Tue, Oct 17, 2017 at 02:46:37PM +0000, Eric Robinson wrote:
> > Guys, I think we have an important new development. Last night I put
> > both nodes into standalone mode and promoted the secondary to primary.
> > This effectively gave me two identical test platforms, running
> > disconnected, both writing through DRBD to the local media on each
> > server. I ran the full TrimTester tool, including the scripts which do
> > periodic TRIM while the test is in progress. Everything went a lot
> > faster because of no network replication. Both servers filled their
> > volumes 24 times without any errors. That's 24TB of disk writing
> > without a single file corruption error.
> >
> > Lars, et. al., this would seem to indicate that the "file corruption"
> > error only occurs when the DRBD nodes are sync'd and replicating.
> 
> Which is when DRBD (due to limitted bandwidth on the network) throttles your
> writer threads and the backend has enough IOPS left to serve the reader more
> quickly, letting it make progress up to the 4g files.
> 
> > This morning I have put the cluster back into Primary/Secondary mode
> > and I'm doing a full resync. After that, I'm going to run the
> > TrimTester tool and see if the corruption happens again. If it does,
> > what does that tell us?
> 
> That does tell us that 8 sequential writer threads completely starve out a 
> single
> mmap reader thread that even explicitly asked for no read-ahead, while
> continually having the cache dropped on it.
> 
> You could change the trimtester to ask for readahead and stop dropping
> caches, which makes it completly useless for its original purpose, but would
> much faster find the spurious 4g wrap-around "corruption".
> 
> You could change the trimtester to start with the "big" files, and then slowly
> reduce the size, instead of starting with the "smaller" files and slowly
> increasing the size, or keep the "increasing sizes" direction, but simply 
> skip the
> smaller sizes.
> The checker would start with checking the large files, and immediately hit its
> 32bit wrap-around bug.
> 
> You could tune your backend or the IO schedulers to prefer reads, or throttle
> writes.
> 
> You could change the trimtester code to
> "sleep" occasionally in the writer threads, to give the checker some more room
> to breath.
> 
> You could change the trimtester to wait for all files to be checked before
> exiting completely due to "disk full". Or change it to have a mode that *only*
> does checking, and run that before the "rm -rf test/".
> 
> Whatever.
> 
> Most importantly: once the trimtester (or *any* "corruption detecting"
> tool) claims that a certain corruption is found, you look at what supposedly 
> is
> corrupt, and double check if it in fact is.
> 
> Before doing anything else.
> 

I did that, but I don't know what a "good" file is supposed to look like, so I 
can't tell whether it is really corrupted. The last time TrimTester reported a 
corrupt file, I checked it manually and it looked fine to me, but I don't know 
what I'm looking for. As far as I know, all the TrimTester tool does is check 
if the file contents are all zeroes. I don't know how that helps me. Can you 
tell from looking at the TrimTester code what I should look for to manually 
check the file?

Or if you're aware of another tool that does high-load read/write tests on a 
mounted filesystem and tests for file corruption, I'd be happy to use that 
instead of TrimTester. Of course, I'd still run the scripts that do periodic 
trim and drop the caches, since the whole point is to verify that everything is 
working properly with TRIM enabled. 

> Double check if the tool would still claim corruption exists, even if you 
> cannot
> see that corruption with other tools.
> 
> If so, find out why that tool does that, because that'd be clearly a bug in 
> that
> tool.
> 
> --


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

2017-10-17 Thread Eric Robinson
Guys, I think we have an important new development. Last night I put both nodes 
into standalone mode and promoted the secondary to primary. This effectively 
gave me two identical test platforms, running disconnected, both writing 
through DRBD to the local media on each server. I ran the full TrimTester tool, 
including the scripts which do periodic TRIM while the test is in progress. 
Everything went a lot faster because of no network replication. Both servers 
filled their volumes 24 times without any errors. That's 24TB of disk writing 
without a single file corruption error. 

Lars, et. al., this would seem to indicate that the "file corruption" error 
only occurs when the DRBD nodes are sync'd and replicating. This morning I have 
put the cluster back into Primary/Secondary mode and I'm doing a full resync. 
After that, I'm going to run the TrimTester tool and see if the corruption 
happens again. If it does, what does that tell us? 

-Eric 

> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Eric Robinson
> Sent: Monday, October 16, 2017 4:13 PM
> To: jan.baku...@gmail.com; drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD
> 8.4 and 9.0
> 
> > > Well, damn. As the program was supposedly reviewed by Samsung
> > engineers as part of their efforts to diagnose the root cause of TRIM
> > errors, it never occurred to me that it was that buggy. I can't thank
> > you enough for finding that! The rollout of some new DRBD clusters has
> > been on hold for 2+ months while we've been trying to track this down.
> >
> > Eric, thanks for the time you took to demonstrate and reproduce an
> > issue you thought you found. These things happen. Samsung engineers
> > are people too :-) although it seems they overlooked a rather elementary
> problem.
> >
> > Any many many thanks to Lars for taking the time to review this issue,
> > finding the cause and putting our minds to rest.
> >
> > kind regards,
> > Jan
> >
> 
> I'm not quite decided. As Lars suggested, I'm going to try to make it fail a
> couple more times and see if the supposedly "corrupt" files are indeed >= 
> 4GiB.
> I'll update everyone when I have more information. Thanks all of you for 
> paying
> attention to this.
> 
> --Eric
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

2017-10-16 Thread Eric Robinson
> > Well, damn. As the program was supposedly reviewed by Samsung
> engineers as part of their efforts to diagnose the root cause of TRIM errors, 
> it
> never occurred to me that it was that buggy. I can't thank you enough for
> finding that! The rollout of some new DRBD clusters has been on hold for 2+
> months while we've been trying to track this down.
> 
> Eric, thanks for the time you took to demonstrate and reproduce an issue
> you thought you found. These things happen. Samsung engineers are people
> too :-) although it seems they overlooked a rather elementary problem.
> 
> Any many many thanks to Lars for taking the time to review this issue, finding
> the cause and putting our minds to rest.
> 
> kind regards,
> Jan
>

I'm not quite decided. As Lars suggested, I'm going to try to make it fail a 
couple more times and see if the supposedly "corrupt" files are indeed >= 4GiB. 
I'll update everyone when I have more information. Thanks all of you for paying 
attention to this.

--Eric
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

2017-10-16 Thread Eric Robinson
> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Lars Ellenberg
> Sent: Saturday, October 14, 2017 1:05 PM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD
> 8.4 and 9.0
> 
> On Thu, Oct 12, 2017 at 11:14:55AM +0200, Robert Altnoeder wrote:
> > On 10/11/2017 11:30 PM, Eric Robinson wrote:
> > > The TrimTester program consists of three parts. The main executable
> > > (TrimTester) just writes loads of data to the drive and tests for
> > > file corruption. My C++ consultant says, "It writes sequential
> > > numbers wrapped at 256, spanning multiple files. It checks
> > > previously written files, and if the file data is all zeroes, it is
> > > considered to be corrupted."
> > Are you referring to this program?
> > https://github.com/algolia/trimtester/blob/master/trimtester.cpp
> >
> > One thing that I can tell you right away is that this program does not
> > appear to be very trustworthy, because it may malfunction due to the
> > use of incorrect datatypes for the purpose - apparently, it is
> > attempting to memory-map quite large files (~ 70 GiB) and check using
> > a byte-indexed offset declared as type 'unsigned', which is commonly
> > only 32 bits wide, and therefore inadequate for the byte-wise indexing
> > of anything that is larger than 4 GiB.
> 
> The test program has other issues as well, like off-by-one (and thus stack
> corruption) when initializing the "buffer" in its "writeAtomically", unlinking
> known non-existent files, and other things.
> Probably harmless.
> 
> But this 32bit index vs >= 4 GiByte file content is the real bug here, thank 
> you
> Robert for pointing that out.
> 
> Why it does not trigger if DRBD is not in the stack I cannot tell, maybe the
> timing is just strangely scewed, and somehow your disk fills up and
> everything terminates before the "DetectCorruption" thread tries to check a
> >= 4GiB file for the first time.
> 
> Anyways: what happens is:
> 
> void _checkFile(const std::string , const char *file, std::string
> ) {
> filename.resize(0);
> filename.append(path);
> filename.push_back('/');
> filename.append(file);
> MMapedFile mmap(filename.c_str());
> if (mmap.loaded()) {
> bool corrupted = false;
> // Detect all 512-bytes page inside the file filled by 0 -> can 
> be caused
> by a buggy Trim
> for (unsigned i = 0; !corrupted && i < mmap.len(); i += 512) {
> 
> // after some number of iterrations,
> // i = 4294966784, 2 ** 32 - 512;
> // mmap.len however is *larger*.
> // in the "i << mmpa.len()", the 32bit integer i is "upscaled", // size-
> extended, before the comparison, so that remains true.
> 
> if (mmap.len() - i > 4) { // only check page > 4-bytes to 
> avoid false
> positive
> 
> // again, size-extension to 64bit, condition is true
> 
> bool pagecorrupted = true;
> 
> // *assume* that the "page" was corrupted,
> 
> for (unsigned j = i; j < mmap.len() && j < (i + 512); 
> ++j) {
> 
> // j = i, which is j = 4294966784, (i << mmap.len) is again true because // of
> the size-extension of i to 64bit in that term, // but for the (j < i+ 512) 
> term,
> neither j nor i is size-extended, // i + 512 wraps to 0, j < 0 is false, // 
> loop will
> not execute even once, // which means no single byte is checked
> 
> if (mmap.content()[j] != 0)
> pagecorrupted = false;
> }
> if (pagecorrupted)
> corrupted = true;
> 
> // any we "won" a "corrupted" flag by simply "assuming"
> // no bytes are bad bytes.
> // "So sad." ;-)
> 
> }
> }
> if (corrupted) {
> std::cerr << "Corrupted file found: " << filename << 
> std::endl;
> exit(1);
> }
> 
> }
> }
> 
> 
> Just change "unsigned" to "uint64_t" there, and be happy.
> 
> 
> Don't believe it?
> Create any file of 4 GiB or larger,
> make sure it does not contain 512 (aligned) consecutive zeros, and "check" it
> for "corruption" with that logic of trimtester.
> It will report that file as corrupted each t

Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

2017-10-13 Thread Eric Robinson
> First, too "all of you",
> if someone has some spare hardware and is willing to run the test as
> suggested by Eric, please do so.
> Both "no corruption reported after X iterations" and "corruption reported
> after X iterations" is important feedback.
> (State the platform and hardware and storage subsystem configuration and
> other potentially relevant info)
> 
> Also, interesting question: did you run your non-DRBD tests on the exact
> same backend (LV, partition, lun, slice, whatever), or on some other "LV" or
> "partition" on the "same"/"similar" hardware?

Same hardware. Procedure was as follows:

6 x SSD drives in system.

Created 2 x volume groups:
vgcreate vg_under_drbd0 /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/sdd5 
/dev/sde5 /dev/sdf5
vgcreate vg_without_drbd /dev/sda6 /dev/sdb6 /dev/sdc6 /dev/sdd6 
/dev/sde6 /dev/sdf6

Created 2 x LVM arrays: 
lvcreate -i6 -I4 -l 100%FREE -nlv_under_drbd0 vg_under_drbd0
lvcreate -i6 -I4 -l 100%FREE -nlv_without_drbd vg_without_drbd

Started drbd

Created an ext4 filesystem on /dev/drbd0
Created an ext4 filesystem on /dev/vg_without_drbd/lv_without_drbd

Mounted /dev/drbd0 on /volume1
Mounted /dev/vg_without_drbd/lv_without_drbd on /volume1

Ran TrimTester on /volume1. It failed after writing 700-900 GB on multiple test 
iterations
Ran TrimTester on /volume2. No failure after 20 TB written.

> 
> Now,
> "something" is different between test run with or without DRBD.
> 
> First suspect was something "strange" happening with TRIM, but you think
> you can rule that out, because you ran the test without trim as well.
> 
> The file system itself may cause discards (explicit mount option "discard",
> implicit potentially via mount options set in the superblock), it does not 
> have
> to be the "fstrim".

The discard option was not explicitly set. I'm not sure about implicitly.

> 
> Or maybe you still had the fstrim loop running in the background from a
> previous test, or maybe something else does an fstrim.
> 
> So we should double check that, to really rule out TRIM as a suspect.
> 

Good thought, but I was careful to ensure that the shell script which performs 
the trim was not running.


> You can disable all trim functionality in linux by echo 0 >
> /sys/devices/pci:00/:00:01.1/ata2/host1/target1:0:0/1:0:0:0/block/s
> r0/queue/discard_max_bytes
> (or similar nodes)
> 
> something like this, maybe:
> echo 0 | tee  /sys/devices/*/*/*/*/*/*/block/*/queue/discard_max_bytes
> 
> To have that take effect for "higher level" or "logical" devices, you'd have 
> to
> "stop and start" those, so deactivate DRBD, deactivate volume group,
> deactivate md raid, then reactivate all of it.
> 
> double check with "lsblk -D" if the discards now are really disabled.
> 
> then re-run the tests.
> 

Okay, I will try that. 

> 
> In case "corruption reported" even if we are "certain" that discard is out of
> the picture, that is an important data point as well.
> 
> What changes when DRBD is in the IO stack?
> Timing (when does the backend device see which request) may be changed.
> Maximum request size may be changed.
> Maximum *discard* request size *will* be changed, which may result in
> differently split discard requests on the backend stack.
> 
> Also, we have additional memory allocations for DRBD meta data and
> housekeeping, so possibly different memory pressure.
> 
> End of brain-dump.
> 
> 

In the meantime, I tried a different kind of test, as follows:

ha11a:~ # badblocks -b 4096 -c 4096 -s /dev/drbd0 -w
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done

Of course, /dev/drbd0 was unmounted at the time. 

It ran for 16 hours and reported NO bad blocks. I'm not sure if this provides 
any useful clues.  

-Eric


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

2017-10-12 Thread Eric Robinson
> Are you referring to this program?
> https://github.com/algolia/trimtester/blob/master/trimtester.cpp
>

Yes, that is the program.
 
> One thing that I can tell you right away is that this program does not appear 
> to
> be very trustworthy, because it may malfunction due to the use of incorrect
> datatypes for the purpose - apparently, it is attempting to memory-map quite
> large files (~ 70 GiB) and check using a byte-indexed offset declared as type
> 'unsigned', which is commonly only 32 bits wide, and therefore inadequate for
> the byte-wise indexing of anything that is larger than 4 GiB.
> 
> While this indicates that the program might miss actual corruption, so far I
> have not found any definitive proof that the program will generate false
> positives (however, I did not check the program in much detail, judging by the
> overall quality I would not be surprised if it did), so we should still 
> continue
> investigating.
> 
> I would certainly recommend to double-check by running some other software
> to check for data corruption issues to ensure that the problem is not
> malfunctioning test software.
> 

I can't personally dispute your comments. I did have it checked by two other 
C++ consultants and neither one mentioned what you said. However, all I can 
tell you is that the program runs continuously without a problem when DRBD is 
not in the stack. I ran it for 15+ hours and wrote 20 TB of data without 
corruption. However, when DRBD is in the stack, it reports corruption after 
about 700-900 GB of data has been written.

I would be happy to try a different tool if you can recommend (or write) one.   

--Eric
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

2017-10-11 Thread Eric Robinson
> > To gather a few more data points,
> > does the behavior on DRBD change, if you  disk { disable-write-same; }
> > # introduced only with drbd 8.4.10 or if you set  disk  { al-updates
> > no; } # affects timing, among other things

Yes, the behavior is the same with 'al-updates no'. The program detected a 
corrupt file after writing 823 GB. 

Plus, see my answers below...


> -Original Message-
> From: Eric Robinson
> Sent: Wednesday, October 11, 2017 2:30 PM
> To: Eric Robinson <eric.robin...@psmnv.com>; Lars Ellenberg
> <lars.ellenb...@linbit.com>; drbd-user@lists.linbit.com
> Subject: RE: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD
> 8.4 and 9.0
> 
> Hi Lars -
> 
> I'm finally back from my trip and eager to get rolling on this.
> 
> >Interesting.
> >Actually, alarming.
> 
> Glad we agree on that!
> 
> > Which *exact* DRBD module versions, identified by their git commit ids?
> 
> Does this answer your question?
> 
> ha11a:~ # modinfo drbd
> filename:   /lib/modules/4.4.74-92.29-default/updates/drbd.ko
> alias:  block-major-147-*
> license:GPL
> version:8.4.10-1
> description:drbd - Distributed Replicated Block Device v8.4.10-1
> author: Philipp Reisner <p...@linbit.com>, Lars Ellenberg
> <l...@linbit.com>
> srcversion: 611DC432097FDFEB703FF9F
> depends:libcrc32c
> vermagic:   4.4.74-92.29-default SMP mod_unload modversions
> 
> > "to make sure SSD TRIM was not a factor":
> > how exactly did you try to do that?
> 
> The TrimTester program consists of three parts. The main executable
> (TrimTester) just writes loads of data to the drive and tests for file 
> corruption.
> My C++ consultant says, "It writes sequential numbers wrapped at 256,
> spanning multiple files. It checks previously written files, and if the file 
> data is
> all zeroes, it is considered to be corrupted."
> 
> The other two parts of the tool are shell scripts. One scripts periodically 
> calls
> fstrim, the other periodically drops the caches. I simply ran the TrimTester
> executable without the scripts so the fstrim command never got called during
> the test.
> 
> > What are the ext4 mount options,
> > explicit or implicit?
> > (as reported by tune2fs and /proc/mounts)
> 
> ha11a:~ # cat /proc/mounts|grep ha
> /dev/drbd0 /ha01_mysql ext4 rw,relatime,stripe=6,data=ordered 0 0
> 
> > To gather a few more data points,
> > does the behavior on DRBD change, if you  disk { disable-write-same; }
> > # introduced only with drbd 8.4.10 or if you set  disk  { al-updates
> > no; } # affects timing, among other things
> 
> 
> 8.4.1 did not recognize the 'disable-write-same' option, but I'm testing right
> now with 'al-updates no' and I'll report the results!
> 
> > Can you reproduce with other backend devices?
> 
> I don't have any other backend devices to test with. All I know is that the
> problem does not occur when writing directly to the devices (bypassing the
> drbd layer).
> 
> --Eric
> 

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

2017-10-11 Thread Eric Robinson
Hi Lars -

I'm finally back from my trip and eager to get rolling on this. 

>Interesting.
>Actually, alarming.

Glad we agree on that!

> Which *exact* DRBD module versions, identified by their git commit ids?

Does this answer your question?

ha11a:~ # modinfo drbd
filename:   /lib/modules/4.4.74-92.29-default/updates/drbd.ko
alias:  block-major-147-*
license:GPL
version:8.4.10-1
description:drbd - Distributed Replicated Block Device v8.4.10-1
author: Philipp Reisner , Lars Ellenberg 

srcversion: 611DC432097FDFEB703FF9F
depends:libcrc32c
vermagic:   4.4.74-92.29-default SMP mod_unload modversions

> "to make sure SSD TRIM was not a factor":
> how exactly did you try to do that?

The TrimTester program consists of three parts. The main executable 
(TrimTester) just writes loads of data to the drive and tests for file 
corruption. My C++ consultant says, "It writes sequential numbers wrapped at 
256, spanning multiple files. It checks previously written files, and if the 
file data is all zeroes, it is considered to be corrupted."

The other two parts of the tool are shell scripts. One scripts periodically 
calls fstrim, the other periodically drops the caches. I simply ran the 
TrimTester executable without the scripts so the fstrim command never got 
called during the test.

> What are the ext4 mount options,
> explicit or implicit?
> (as reported by tune2fs and /proc/mounts)

ha11a:~ # cat /proc/mounts|grep ha
/dev/drbd0 /ha01_mysql ext4 rw,relatime,stripe=6,data=ordered 0 0

> To gather a few more data points,
> does the behavior on DRBD change, if you  disk { disable-write-same; } 
> # introduced only with drbd 8.4.10 or if you set  disk  { al-updates no; } 
> # affects timing, among other things


8.4.1 did not recognize the 'disable-write-same' option, but I'm testing right 
now with 'al-updates no' and I'll report the results!

> Can you reproduce with other backend devices?

I don't have any other backend devices to test with. All I know is that the 
problem does not occur when writing directly to the devices (bypassing the drbd 
layer).

--Eric


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

2017-10-05 Thread Eric Robinson
Hi Lars --

I've been travelling and just saw your response, and now I'm travelling again. 
I am very eager to provide answers to your questions and will do so at my first 
opportunity!

--Eric 

-Original Message-
From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Lars Ellenberg
Sent: Tuesday, October 3, 2017 12:43 AM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 
and 9.0

On Mon, Sep 25, 2017 at 09:02:57PM +, Eric Robinson wrote:
> Problem:
> 
> Under high write load, DRBD exhibits data corruption. In repeated 
> tests over a month-long period, file corruption occurred after 700-900 
> GB of data had been written to the DRBD volume.

Interesting.
Actually, alarming.

Can anyone else reproduce these findings?
In a similar or different environment?

> Testing Platform:
> 
> 2 x Dell PowerEdge R610 servers
> 32GB RAM
> 6 x Samsung SSD 840 Pro 512GB (latest firmware) Dell H200 JBOD 
> Controller SUSE Linux Enterprise Server 12 SP2 (kernel 4.4.74-92.32) 
> Gigabit network, 900 Mbps throughput, < 1ms latency, 0 packet loss
> 
> Initial Setup:
> 
> Create 2 RAID-0 software arrays using either mdadm or LVM
> On Array 1: sda5 through sdf5, create DRBD replicated volume (drbd0) with 
> an ext4 filesystem
> On Array 2: sda6 through sdf6, create LVM logical volume with an 
> ext4 filesystem
> 
> Procedure:
> 
> Download and build the TrimTester SSD burn-in and TRIM verification tool 
> from Algolia (https://github.com/algolia/trimtester).
> Run TrimTester against the filesystem on drbd0, wait for corruption to 
> occur
> Run TrimTester against the non-drbd backed filesystem, wait for 
> corruption to occur
> 
> Results:
> 
> In multiple tests over a period of a month, TrimTester would report 
> file corruption when run against the DRBD volume after 700-900 GB of 
> data had been written. The error would usually appear within an hour 
> or two. However, when running it against the non-DRBD volume on the 
> same physical drives, no corruption would occur. We could let the 
> burn-in run for 15+ hours and write 20+ TB of data without a problem.
> Results were the same with DRBD 8.4 and 9.0.

Which *exact* DRBD module versions, identified by their git commit ids?

> We also tried disabling
> the TRIM-testing part of TrimTester and using it as a simple burn-in 
> tool, just to make sure that SSD TRIM was not a factor.

"to make sure SSD TRIM was not a factor":
how exactly did you try to do that?
What are the ext4 mount options,
explicit or implicit?
(as reported by tune2fs and /proc/mounts)

> Conclusion:
> 
> We are aware of some controversy surrounding the Samsung SSD 8XX 
> series drives; however, the issues related to that controversy were 
> resolved and no longer exist as of kernel 4.2. The 840 Pro drives are 
> confirmed to support RZAT. Also, the data corruption would only occur 
> when writing through the DRBD layer. It never occurred when bypassing 
> the DRBD layer and writing directly to the drives, so we must conclude 
> that DRBD has a data corruption bug under high write load.

Or that DRBD changes the timing / IO pattern seen by the backend sufficiently 
to expose a bug elsewhere.

> However, we would be more than happy to be proved wrong.

To gather a few more data points,
does the behavior on DRBD change, if you  disk { disable-write-same; } # 
introduced only with drbd 8.4.10 or if you set  disk  { al-updates no; } # 
affects timing, among other things

Can you reproduce with other backend devices?

--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD(r) and LINBIT(r) are registered trademarks of LINBIT __ please don't Cc 
me, but send to list -- I'm subscribed 
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] drbdadm verify status starts at 100% and stays there?

2017-08-03 Thread Eric Robinson
I have drbd 9.0.8. I started an online verify, and immediately checked status, 
and I see...

ha11a:/ha01_mysql/trimtester # drbdadm status
ha01_mysql role:Primary
  disk:UpToDate
  ha11b role:Secondary
replication:VerifyT peer-disk:UpToDate done:100.00

...but the tail of dmesg says...

[336704.851209] drbd ha01_mysql/0 drbd0 ha11b: repl( Established -> VerifyT )
[336704.851244] drbd ha01_mysql/0 drbd0: Online Verify start sector: 0

...which looks like the verify is still in progress.

So is it done, or is it still in progress? Is this a drbd bug?

--
Eric Robinson



___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD and TRIM -- Slow! -- RESOLVED

2017-08-03 Thread Eric Robinson
Typos corrected.

--
Eric Robinson

From: Eric Robinson
Sent: Thursday, August 03, 2017 10:09 AM
To: Eric Robinson <eric.robin...@psmnv.com>; drbd-user@lists.linbit.com
Subject: RE: DRBD and TRIM -- Slow! -- RESOLVED

For anyone else who has this problem, we have reduced the time required to trim 
a 1.3TB volume from 3 days to 1.5 minutes.

Initially, we used mdraid to build a raid0 array with a 32K chunk size. We 
initialized it as a drbd disk, synced it, built an lvm logical volume on it, 
and created an ext4 filesystem on the volume. Creating the filesystem and 
trimming it took 3 days (each time, every time, across multiple tests).

When running lsblk -D, we noticed that the DISC-MAX value for the array was 
only 32K, compared to 4GB for the SSD drive itself. We also noticed that the 
number matched the chunk size. We theorized that the small DISC-MAX value was 
responsible for the slow trim rate across the DRBD link. We deleted the array 
and built a new one with a 4MB chunk size. The DISC-MAX value changed 4MB, 
which is the max selectable chunk size (but  still way below the other DISC-MAX 
values shown in lsblk -D). We realized that, when using mdadm, the DISK-MAX 
value ends up matching the array chunk size.

Instead of using mdadm to build the array, we used LVM to create a striped 
logical volume and made that the backing device for drbd. Then lsblk -D showed 
a DISC-MAX size of 128MB.  Creating an ext4 filesystem on it and trimming only 
took 1.5 minutes (across multiple tests).

Somebody knowledgeable may be able to explain how DISC-MAX affects the trim 
speed, and why the DISC-MAX value is different when creating the array with 
mdadm versus lvm.

--
Eric Robinson


From: 
drbd-user-boun...@lists.linbit.com<mailto:drbd-user-boun...@lists.linbit.com> 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Eric Robinson
Sent: Tuesday, August 01, 2017 3:28 PM
To: drbd-user@lists.linbit.com<mailto:drbd-user@lists.linbit.com>
Subject: [DRBD-user] DRBD and TRIM -- Slow!

I have 6 x Samsung SSD 840 pro drives in an RAID 0 configuration (mdraid).

When I write to a non-DRBD partition on the drive, bypassing caches, I get 400 
MB/sec.

When I trim a filesystem mounted on a non-DRBD partition, it finishes fast.

When I write to a DRBD replicated volume, I get 80MB/sec, which is about when I 
would expect using protocol C over a gigabit network.

When I trim a filesystem that is mounted on a drbd device it is extremely slow. 
It takes three days to trim a 1.2TB volume.

Running iperf between nodes shows 900Mbits/sec bandwidth, <1 ms latency, no 
packet loss.

Why is trimming a DRBD volume so slow? This makes the servers unusable.

--
Eric Robinson

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD and TRIM -- Slow! -- RESOLVED

2017-08-03 Thread Eric Robinson
For anyone else who has this problem, we have reduced the time required to trim 
a 1.3TB volume from 3 days to 1.5 minutes.

Initially, we used mdraid rto build a raid0 array with a 32K chunk size. We 
initialized it as a drbd disk, synced it, built an lvm logical volume on it, 
and created an ext4 filesystem on the volume. Creating the filesystem and 
tripping it took 3 days (each time, every time, across multiple tests).

When running lsblk -D, we noticed that the DISC-MAX value for the array was 
only 32K, compared to 4GB for the SSD drive itself. We also noticed that the 
number matched the chunk size. We theorized that the small DISC-MAX value was 
responsible for the slow trim rate across the DRBD link. We deleted the array 
and built a new one with a 4MB chunk size. The DISC-MAX value changed 4MB, 
which is the max selectable chunk size (but  still way below the other DISC-MAX 
values shown in lsblk -D). We realized that, when using mdadm, the DISK-MAX 
value ends up matching the array chunk size.

Instead of using mdadm to build the array, we used LVM to create a striped 
logical volume and made that the backing device for drbd. Then lsblk -D showed 
a DISC-MAX size of 128MB.  Creating an ext4 filesystem on it and trimming only 
took 1.5 minutes (across multiple tests).

Somebody knowledgeable may be able to explain how DISC-MAX affects the trim 
speed, and why the DISC-MAX value is different when creating the array with 
mdadm versus lvm.

--
Eric Robinson


From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Eric Robinson
Sent: Tuesday, August 01, 2017 3:28 PM
To: drbd-user@lists.linbit.com
Subject: [DRBD-user] DRBD and TRIM -- Slow!

I have 6 x Samsung SSD 840 pro drives in an RAID 0 configuration (mdraid).

When I write to a non-DRBD partition on the drive, bypassing caches, I get 400 
MB/sec.

When I trim a filesystem mounted on a non-DRBD partition, it finishes fast.

When I write to a DRBD replicated volume, I get 80MB/sec, which is about when I 
would expect using protocol C over a gigabit network.

When I trim a filesystem that is mounted on a drbd device it is extremely slow. 
It takes three days to trim a 1.2TB volume.

Running iperf between nodes shows 900Mbits/sec bandwidth, <1 ms latency, no 
packet loss.

Why is trimming a DRBD volume so slow? This makes the servers unusable.

--
Eric Robinson

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD and TRIM -- Slow!

2017-08-01 Thread Eric Robinson
I have 6 x Samsung SSD 840 pro drives in an RAID 0 configuration (mdraid).

When I write to a non-DRBD partition on the drive, bypassing caches, I get 400 
MB/sec.

When I trim a filesystem mounted on a non-DRBD partition, it finishes fast.

When I write to a DRBD replicated volume, I get 80MB/sec, which is about when I 
would expect using protocol C over a gigabit network.

When I trim a filesystem that is mounted on a drbd device it is extremely slow. 
It takes three days to trim a 1.2TB volume.

Running iperf between nodes shows 900Mbits/sec bandwidth, <1 ms latency, no 
packet loss.

Why is trimming a DRBD volume so slow? This makes the servers unusable.

--
Eric Robinson

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Fails to Establish Connection

2017-07-31 Thread Eric Robinson
# make drbdmon

...worked fine.

Thanks!

Now if I could just figure out why discards are going so slow. At this rate, it 
will take 3 days to discard a couple of 1.2TB filesystems. 

--
Eric Robinson
   

> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Robert Altnoeder
> Sent: Monday, July 31, 2017 7:43 AM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] DRBD Fails to Establish Connection
> 
> On 07/31/2017 03:58 PM, Eric Robinson wrote:
> > Since the following is what I have installed, should it have built?
> >
> > ha11a:~ # rpm -qa|grep gcc
> > gcc48-c++-4.8.5-30.1.x86_64
> > gcc-4.8-6.189.x86_64
> > libgcc_s1-32bit-6.2.1+r239768-2.4.x86_64
> > libgcc_s1-6.2.1+r239768-2.4.x86_64
> > gcc48-4.8.5-30.1.x86_64
> > gcc-c++-4.8-6.189.x86_64
> I have no SuSE test system available right now, but I can confirm it does 
> build
> with devtoolset-2 g++ 4.8.2 on CentOS 6.8 Final.
> 
> If the autotools do not recognize the compiler as being capable to build it,
> you can still try to run:
> cd drbd-utils/user/drbdmon
> make drbdmon
> 
> (or something like 'make -j 20 drbdmon' for a parallelized build with up to 20
> tasks)
> 
> If that build succeeds, it will create a binary file 'drbdmon' in the same
> directory, which you can simply move to wherever the other drbd-utils are
> installed, commonly /usr/sbin/
> 
> --
> Robert Altnoeder
> +43 1 817 82 92 0
> robert.altnoe...@linbit.com
> 
> LINBIT | Keeping The Digital World Running DRBD - Corosync - Pacemaker f /
> t /  in /  g+
> 
> DRBD(r) and LINBIT(r) are registered trademarks of LINBIT, Austria.
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Fails to Establish Connection

2017-07-31 Thread Eric Robinson
Since the following is what I have installed, should it have built? 

ha11a:~ # rpm -qa|grep gcc
gcc48-c++-4.8.5-30.1.x86_64
gcc-4.8-6.189.x86_64
libgcc_s1-32bit-6.2.1+r239768-2.4.x86_64
libgcc_s1-6.2.1+r239768-2.4.x86_64
gcc48-4.8.5-30.1.x86_64
gcc-c++-4.8-6.189.x86_64

--
Eric Robinson
Chief Information Officer
Physician Select Management, LLC
775.885.2211 x 112
   

> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Robert Altnoeder
> Sent: Monday, July 31, 2017 3:15 AM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] DRBD Fails to Establish Connection
> 
> On 07/31/2017 11:35 AM, Eric Robinson wrote:
> > I'm using drbd-utils-9.0.0 but it does not appear to have drbdmon.
> 
> If you built from source, make sure your compiler is new enough.
> For drbdmon to be included in the build, you need a C++11-capable compiler.
> E.g., gcc/g++ 4.7.x is not detected as C++11-capable, although it might
> actually compile with this version; I can confirm it gets built with g++
> 4.9.2 on Debian 8 and g++ 5.3.1 on Ubuntu.
> 
> --
> Robert Altnoeder
> +43 1 817 82 92 0
> robert.altnoe...@linbit.com
> 
> LINBIT | Keeping The Digital World Running DRBD - Corosync - Pacemaker f /
> t /  in /  g+
> 
> DRBD(r) and LINBIT(r) are registered trademarks of LINBIT, Austria.
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Discarding device blocks very slow

2017-07-31 Thread Eric Robinson
It's been running all night, and still at:

Discarding device blocks : 60297216/322122752

--
Eric Robinson
Chief Information Officer
Physician Select Management, LLC
775.885.2211 x 112
   

> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Eric Robinson
> Sent: Monday, July 31, 2017 2:33 AM
> To: Robert Altnoeder <robert.altnoe...@linbit.com>; drbd-
> u...@lists.linbit.com
> Subject: Re: [DRBD-user] Discarding device blocks very slow
> 
> For what it's worth:
> 
> 1. Discard is enabled and working on both nodes.
> 2. No messages about trim or discard in the logs.
> 3. Network latency is sub-1 millisecond, bandwidth is 1 Gbit (utilization 
> about
> 10%).
> 4. Drbd is version 9.0.8 on both nodes.
> 
> --
> Eric Robinson
> 
> 
> > -Original Message-
> > From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> > boun...@lists.linbit.com] On Behalf Of Robert Altnoeder
> > Sent: Monday, July 31, 2017 2:17 AM
> > To: drbd-user@lists.linbit.com
> > Subject: Re: [DRBD-user] Discarding device blocks very slow
> >
> > On 07/31/2017 10:46 AM, Eric Robinson wrote:
> > > But that still does not explain why discards across the network are so
> slow.
> >
> > There are lots of possible reasons.
> > Discard might not be enabled on both nodes, there should be a message
> > about discard being available or unavailable in the logs. IIRC, if it
> > can't discard, it will zero-out.
> > The network may have too high latency or too low bandwidth.
> > Without a great amount of detailed information all one can do about
> > performance-related issues is to guess.
> >
> > Generally, both nodes should be running the same and most recent
> > version of DRBD 9, because all those features have been added quite
> > recently, and lots of enhancements and fixes are still applied to the code.
> >
> > --
> > Robert Altnoeder
> > +43 1 817 82 92 0
> > robert.altnoe...@linbit.com
> >
> > LINBIT | Keeping The Digital World Running DRBD - Corosync - Pacemaker
> > f / t /  in /  g+
> >
> > DRBD(r) and LINBIT(r) are registered trademarks of LINBIT, Austria.
> > ___
> > drbd-user mailing list
> > drbd-user@lists.linbit.com
> > http://lists.linbit.com/mailman/listinfo/drbd-user
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Fails to Establish Connection

2017-07-31 Thread Eric Robinson
I'm using drbd-utils-9.0.0 but it does not appear to have drbdmon.

Drbdtop sounds awesome.  

--
Eric Robinson
   

> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Robert Altnoeder
> Sent: Monday, July 31, 2017 2:06 AM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] DRBD Fails to Establish Connection
> 
> On 07/24/2017 10:08 PM, Roland Kammerer wrote:
> > On Mon, Jul 24, 2017 at 06:53:25PM +, Eric Robinson wrote:
> >>>   # drbdadm status
> >>>   # drbdadm status --verbose
> >>>   # drbdsetup status
> >>>   # drbdsetup status --statistics
> >>>   # drbdsetup status --verbose --statistics
> >>>
> >> Well, there you go. I guess I keep running into places where commands
> >> do not work as they did previously
> [...]
> > So, with drbd8 you had simple point-to-point connections with two
> > nodes and you had a hand full of resources. That can be exposed via
> > /proc nicely.
> >
> > With drbd9 you can have hundreds of resources. They can have complex
> > topologies like full meshes,
> [...]
> > Exposing that via /proc is a bad idea.
> 
> By the way, newer drbd-utils versions also come with a utility called
> 'drbdmon', which displays a live-view of the resources in a very compressed
> format by hiding or shortening most unnecessary information (all the things
> that are working the way they should) while still highlighting those 
> resources,
> volumes and connections that are in a problematic state, and showing
> detailed status only for those items.
> 
> It was originally written for internal use at LINBIT while testing with many
> resources, where even the output of 'drbdadm status' was too cumbersome
> to read. Since it turned out to be useful for getting a quick overview of 
> what's
> going on with DRBD 9 resources, it was decided to make it publicly available
> as a part of the drbd-utils package.
> 
> To start it, with a unicode-capable terminal, you simply type drbdmon
> 
> If your terminal is not unicode capable, use drbdmon --ascii
> 
> Once in, several hotkeys allow navigating and switching views, most of it is
> pretty much self-explanatory.
> 
> Another utility that is being actively developed by LINBIT staff is 'drbdtop',
> which also enables a user to change the state of
> resources/volumes/connections using hotkeys instead of having to type
> drbdadm commands.
> 
> Therefore I think it is safe to say that you can expect many improvements of
> that kind in the future, and once all the new tools go into production use, I
> guess noone is going to miss the /proc/drbd file anymore.
> 
> --
> Robert Altnoeder
> +43 1 817 82 92 0
> robert.altnoe...@linbit.com
> 
> LINBIT | Keeping The Digital World Running DRBD - Corosync - Pacemaker f /
> t /  in /  g+
> 
> DRBD(r) and LINBIT(r) are registered trademarks of LINBIT, Austria.
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Discarding device blocks very slow

2017-07-31 Thread Eric Robinson
For what it's worth:

1. Discard is enabled and working on both nodes.
2. No messages about trim or discard in the logs.
3. Network latency is sub-1 millisecond, bandwidth is 1 Gbit (utilization about 
10%). 
4. Drbd is version 9.0.8 on both nodes.

--
Eric Robinson
   

> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Robert Altnoeder
> Sent: Monday, July 31, 2017 2:17 AM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] Discarding device blocks very slow
> 
> On 07/31/2017 10:46 AM, Eric Robinson wrote:
> > But that still does not explain why discards across the network are so slow.
> 
> There are lots of possible reasons.
> Discard might not be enabled on both nodes, there should be a message
> about discard being available or unavailable in the logs. IIRC, if it can't 
> discard,
> it will zero-out.
> The network may have too high latency or too low bandwidth.
> Without a great amount of detailed information all one can do about
> performance-related issues is to guess.
> 
> Generally, both nodes should be running the same and most recent version
> of DRBD 9, because all those features have been added quite recently, and
> lots of enhancements and fixes are still applied to the code.
> 
> --
> Robert Altnoeder
> +43 1 817 82 92 0
> robert.altnoe...@linbit.com
> 
> LINBIT | Keeping The Digital World Running DRBD - Corosync - Pacemaker f /
> t /  in /  g+
> 
> DRBD(r) and LINBIT(r) are registered trademarks of LINBIT, Austria.
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Discarding device blocks very slow

2017-07-31 Thread Eric Robinson
Doh. Okay, I guess I was thinking that DRBD deals with logical blocks and it 
would not know anything about the internal physical structure of the SSD 
drives, and the physical structure of one SSD drive would not necessarily be 
the same as the structure of the one at the other end of the network, so why 
would discards be replicated? But that's dumb because it still has to replicate 
the logical block info. 

But that still does not explain why discards across the network are so slow. I 
mean, it's been going for like an hour and it's still only about this far...

Discarding device blocks:  13111296/322122752

--
Eric Robinson
   

> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Robert Altnoeder
> Sent: Monday, July 31, 2017 1:29 AM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] Discarding device blocks very slow
> 
> On 07/31/2017 09:46 AM, Eric Robinson wrote:
> >
> > When I create a filesystem on an LVM logical volume on a drbd device,
> > I see "Discarding device blocks." I wouldn't think that discards would
> > be replicated, but it is going super slow. I mean SUPER slow. Discards
> > normally go a 100 times faster. Are discards replicated across the
> > network for some reason?
> >
> Obviously. If a block is discarded locally, the same change has to be applied 
> to
> the replica, otherwise a read from the same block on both systems can yield
> different results.
> >
> > --
> >
> > Eric Robinson
> >
> --
> Robert Altnoeder
> +43 1 817 82 92 0
> robert.altnoe...@linbit.com
> 
> LINBIT | Keeping The Digital World Running DRBD - Corosync - Pacemaker f / t
> / in / g+
> 
> DRBD(r) and LINBIT(r) are registered trademarks of LINBIT, Austria.
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Discarding device blocks very slow

2017-07-31 Thread Eric Robinson
When I create a filesystem on an LVM logical volume on a drbd device, I see 
"Discarding device blocks." I wouldn't think that discards would be replicated, 
but it is going super slow. I mean SUPER slow. Discards normally go a 100 times 
faster. Are discards replicated across the network for some reason?

ha11a:/dev # mkfs.ext4 /dev/vg_on_drbd0/lv_on_drbd0
mke2fs 1.42.11 (09-Jul-2014)
Discarding device blocks:   3149824/322122752

--
Eric Robinson

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Show Current Resync Rate

2017-07-26 Thread Eric Robinson
Vlado -- I just want to say this was a wonderful tip. This is exactly what I 
was looking for...

0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-
ns:59926712 nr:0 dw:0 dr:59928512 al:0 bm:0 lo:0 pe:[0;3] ua:2 ap:[0;0] 
ep:1 wo:2 oos:1114423564
[>...] sync'ed:  5.2% (1088304/1146824)M
finish: 3:51:56 speed: 80,076 (75,160 -- 76,236) K/sec
  0% sector pos: 0/2348696088
resync: used:1/61 hits:143940 misses:916 starving:0 locked:0 changed:458
act_log: used:0/1237 hits:0 misses:0 starving:0 locked:0 changed:0
blocked on activity log: 0

--
Eric Robinson

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Eric Robinson
Sent: Wednesday, July 26, 2017 12:04 PM
To: Vladimír Bartoš <bar...@jadro.org>; drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Show Current Resync Rate

Well, there you go! Thanks, Vlado!

--
Eric Robinson

From: 
drbd-user-boun...@lists.linbit.com<mailto:drbd-user-boun...@lists.linbit.com> 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Vladimír Bartoš
Sent: Wednesday, July 26, 2017 12:07 AM
To: drbd-user@lists.linbit.com<mailto:drbd-user@lists.linbit.com>
Subject: Re: [DRBD-user] Show Current Resync Rate

you can get actual resync speed from

/sys/kernel/debug/drbd/resources/${resource_name}/connections/${hostname}/0/proc_drbd

--
Vlado Bartos
bar...@jadro.org<mailto:bar...@jadro.org>




On 26 Jul 2017, at 08:28, Eric Robinson 
<eric.robin...@psmnv.com<mailto:eric.robin...@psmnv.com>> wrote:

I think that the resync progress in drbd9 is visible in drbdadm status

Progress percent yes, rate no.

--
Eric Robinson
___
drbd-user mailing list
drbd-user@lists.linbit.com<mailto:drbd-user@lists.linbit.com>
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Show Current Resync Rate

2017-07-26 Thread Eric Robinson
Well, there you go! Thanks, Vlado!

--
Eric Robinson

From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Vladimír Bartoš
Sent: Wednesday, July 26, 2017 12:07 AM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Show Current Resync Rate

you can get actual resync speed from

/sys/kernel/debug/drbd/resources/${resource_name}/connections/${hostname}/0/proc_drbd

--
Vlado Bartos
bar...@jadro.org<mailto:bar...@jadro.org>



On 26 Jul 2017, at 08:28, Eric Robinson 
<eric.robin...@psmnv.com<mailto:eric.robin...@psmnv.com>> wrote:

I think that the resync progress in drbd9 is visible in drbdadm status

Progress percent yes, rate no.

--
Eric Robinson
___
drbd-user mailing list
drbd-user@lists.linbit.com<mailto:drbd-user@lists.linbit.com>
http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Show Current Resync Rate

2017-07-26 Thread Eric Robinson
> I think that the resync progress in drbd9 is visible in drbdadm status
> 

Progress percent yes, rate no.

--
Eric Robinson
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Show Current Resync Rate

2017-07-25 Thread Eric Robinson
> http://docs.linbit.com/docs/users-guide-9.0/p-work/
> 
> 5.2.2. Status information in /proc/drbd
> 
> '/proc/drbd' is deprecated. While it won’t be removed in the 8.4 series, we
> recommend to switch to other means, like Section 5.2.3, “Status information
> via drbdadm”; or, for monitoring even more convenient, Section 5.2.4, “One-
> shot or realtime monitoring via drbdsetup events2”.
> 
> /proc/drbd is a virtual file displaying basic information about the DRBD
> module. It was used extensively up to DRBD 8.4, but couldn’t keep up with
> the amount of information provided by DRBD 9.
> 

Understood. I'm just not sure why the resync speed is a difficult property to 
determine and report for a given resource. It is very useful information. I get 
that /proc/drbd is no longer available, but is there a reason why the current 
resync rate is not visible somewhere else?

--
Eric Robinson

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Getting TRIM to Work with DRBD

2017-07-25 Thread Eric Robinson
> > > Well, would someone please tell the folks at SUSE to catch up? DRBD
> > > 9.0.1 is what comes with their latest distro, SLE 12 SP2, with the
> > > latest version of their Enterprise High Availability Extension. Even
> > > the OpenSUSE LEAP repos only go as high as 9.0.6. I can build from
> > > source, but I'll lose SUSE support. It looks like that's the course I must
> take. Thanks for the feedback.
> > >
> >
> > Uh oh, I don't see instructions for building from source in the drbd 9
> > user guide. Has that option been removed? Or are the steps the same as
> > for 8.X?
> 
> Just to be extra clear on that: It got removed from the users guide.
> 
> People are free to build from source. Everything required, minus build
> dependencies obviously, is in the repos/tar balls, even a debian directories
> and spec files.
> 

Thanks for the clarification. I spoke with someone on the Linbit chat line 
yesterday who said that the instructions might be added back into the manual at 
some point. Do you know why they were omitted?

--
Eric Robinson

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Getting TRIM to Work with DRBD

2017-07-24 Thread Eric Robinson
> Well, would someone please tell the folks at SUSE to catch up? DRBD 9.0.1 is
> what comes with their latest distro, SLE 12 SP2, with the latest version of
> their Enterprise High Availability Extension. Even the OpenSUSE LEAP repos
> only go as high as 9.0.6. I can build from source, but I'll lose SUSE 
> support. It
> looks like that's the course I must take. Thanks for the feedback.
> 

Uh oh, I don't see instructions for building from source in the drbd 9 user 
guide. Has that option been removed? Or are the steps the same as for 8.X?

--
Eric Robinson
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Getting TRIM to Work with DRBD

2017-07-24 Thread Eric Robinson
> For DRBD 9, we will pretty much ignore any complaints against anything older
> than "latest", which currently is 9.0.8.
> 
> And no, you will not have any luck with 9.0.1. Not at all.
> And not only for discards.
> 

Well, would someone please tell the folks at SUSE to catch up? DRBD 9.0.1 is 
what comes with their latest distro, SLE 12 SP2, with the latest version of 
their Enterprise High Availability Extension. Even the OpenSUSE LEAP repos only 
go as high as 9.0.6. I can build from source, but I'll lose SUSE support. It 
looks like that's the course I must take. Thanks for the feedback.

--
Eric Robinson

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Fails to Establish Connection

2017-07-24 Thread Eric Robinson
>   # drbdadm status
>   # drbdadm status --verbose
>   # drbdsetup status
>   # drbdsetup status --statistics
>   # drbdsetup status --verbose --statistics
> 

Well, there you go. I guess I keep running into places where commands do not 
work as they did previously and it makes me thing there must be something 
wrong. Thanks.

--
Eric Robinson

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Show Current Resync Rate

2017-07-24 Thread Eric Robinson
> >
> >How do I see the resync speed?
> 
> You don't.
> 

Why was this information removed in the transition from v8 to v9? I'm sure 
there is a good reason, but I can't imagine what it is. It has been very useful 
to me in the past.

--
Eric Robinson
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Fails to Establish Connection

2017-07-19 Thread Eric Robinson
I was mistaken. The nodes do establish connection. But they don't show any 
resources.

ha11b:/etc/drbd.d # rcdrbd status
● drbd.service - DRBD -- please disable. Unless you are NOT using a cluster 
manager.
   Loaded: loaded (/usr/lib/systemd/system/drbd.service; disabled; vendor 
preset: disabled)
   Active: active (exited) since Wed 2017-07-19 14:40:38 PDT; 19min ago
  Process: 4350 ExecStart=/lib/drbd/drbd start (code=exited, status=0/SUCCESS)
 Main PID: 4350 (code=exited, status=0/SUCCESS)

Jul 19 14:40:37 ha11b drbd[4350]: Starting DRBD resources: [
Jul 19 14:40:37 ha11b drbd[4350]:  create res: ha01_mysql ha02_mysql
Jul 19 14:40:37 ha11b drbd[4350]:prepare disk: ha01_mysql ha02_mysql
Jul 19 14:40:37 ha11b drbd[4350]: adjust disk: ha01_mysql ha02_mysql
Jul 19 14:40:37 ha11b drbd[4350]: prepare net: ha01_mysql ha02_mysql
Jul 19 14:40:37 ha11b drbd[4350]: attempt to connect: ha01_mysql ha02_mysql
Jul 19 14:40:37 ha11b drbd[4350]: ]
Jul 19 14:40:38 ha11b drbd[4350]: WARN: stdin/stdout is not a TTY; using 
/dev/consoledrbdadm: Unknown command 'sh-b-pri'
Jul 19 14:40:38 ha11b drbd[4350]: .
Jul 19 14:40:38 ha11b systemd[1]: Started DRBD -- please disable. Unless you 
are NOT using a cluster manager..

ha11b:/etc/drbd.d # cat /proc/drbd
version: 9.0.1-1 (api:2/proto:86-111)
GIT-hash: 86e443973082570aeb651848db89e0c7b995c306 build by abuild
Transports (api:14): tcp (1.0.0)




--
Eric Robinson
Chief Information Officer
Physician Select Management, LLC
775-885-2211 x 112

> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Eric Robinson
> Sent: Wednesday, July 19, 2017 10:02 AM
> To: Roland Kammerer <roland.kamme...@linbit.com>; drbd-
> u...@lists.linbit.com
> Subject: Re: [DRBD-user] DRBD Fails to Establish Connection
> 
> [This sender failed our fraud detection checks and may not be who they
> appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]
> 
> > Sorry, but that is so outdated that it is not even funny.
> 
> It's what SUSE supports. They are looking into it.
> 
> Also, I did try 9.0.6 but got the same result.
> 
> --
> Eric Robinson
> 
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Fails to Establish Connection

2017-07-19 Thread Eric Robinson
> Sorry, but that is so outdated that it is not even funny.

It's what SUSE supports. They are looking into it. 

Also, I did try 9.0.6 but got the same result. 

--
Eric Robinson

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] A Tale of Two Identical Servers, but DRBD Only Supports Trim on One of Them

2017-07-14 Thread Eric Robinson
I have two identical Dell PowerEdge R610 servers:

Disks: 6 x Samsung SSD 840 Pro 512GB
Controller: LSI-9207-8i, same firmware
OS: SLES 12 SP2, kernel 4.4.59-92.24
DRBD: 9.0.6+git.08cda19-57.1.x86_64
Backing Device: mdraid 5, /dev/md2
Config file: /sys/module/raid456/parameters/devices_handle_discard_safely=Y
Config file: /etc/lvm/lvm.conf contains: issue_discards = 0
Config file: /etc/crypttab is not used

However, one server shows DRBD supporting TRIM...

NAMEDISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda0  512B   4G 1
├─sda1 0  512B   4G 1
│ └─md00  128K 256M 0
├─sda2 0  512B   4G 1
│ └─md10  128K 256M 0
├─sda3 0  512B   4G 1
├─sda4 0  512B   4G 1
├─sda5 0  512B   4G 1
│ └─md20  128K 256M 0
│   └─drbd00  128K 128M 0


...but the other does not, even though the underlying devices do (note the 0 
values for DISC-GRAN and DISC-MAX for drbd)...

NAMEDISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda0  512B   4G 1
├─sda1 0  512B   4G 1
│ └─md00  128K 256M 0
├─sda2 0  512B   4G 1
│ └─md10  128K 256M 0
├─sda3 0  512B   4G 1
├─sda4 0  512B   4G 1
├─sda5 0  512B   4G 1
│ └─md20  128K 256M 0
│   └─drbd000B   0B 0


What am I missing?

--
Eric Robinson

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Confusion about DRBD 9

2017-07-12 Thread Eric Robinson
In addition to the information in the email below, I am seeing the following 
messages in the logs. Why would the folder /var/lib/drbd.d not exist on either 
server? This is a fresh install of DRBD using the factory HA package that comes 
with SLES 12 SP2. 

I have the following folders...

/var/lib/drbd
/var/lib/drbdmanage
/etc/drbd.d

I do not have...

/var/lib/drbd.d



ha11b:/var/log # grep ha02 messages
2017-07-11T11:45:59.735772-07:00 ha11b org.drbd.drbdmanaged[1369]:   Failed to 
find logical volume "drbdpool/ha02_mysql_00"
2017-07-11T11:45:59.813490-07:00 ha11b org.drbd.drbdmanaged[1369]:   Logical 
volume "ha02_mysql_00" created.
2017-07-11T11:45:59.832280-07:00 ha11b drbdmanaged[4631]: ERROR  cannot 
write to configuration file '/var/lib/drbd.d/drbdmanage_ha02_mysql.res' or 
'/var/lib/drbd.d/drbdmanage_global_common.conf', error returned by the OS is: 
No such file or directory
2017-07-11T11:55:59.953695-07:00 ha11b DRBDmanage:4861: spawning ['drbdadm', 
'-c', '-', 'attach', 'ha02_mysql/0']No valid meta data found
2017-07-11T11:55:59.953890-07:00 ha11b DRBDmanage:4861: Command 'drbdmeta 101 
v09 /dev/drbdpool/ha02_mysql_00 internal apply-al' terminated with exit code 255
2017-07-11T11:56:00.037829-07:00 ha11b DRBDmanage:4877: spawning ['drbdadm', 
'-c', '-', 'connect', 'ha02_mysql']
2017-07-11T12:02:39.286130-07:00 ha11b drbdmanaged[4631]: ERROR  cannot 
write to configuration file '/var/lib/drbd.d/drbdmanage_ha02_mysql.res' or 
'/var/lib/drbd.d/drbdmanage_global_common.conf', error returned by the OS is: 
No such file or directory
2017-07-11T12:02:39.367599-07:00 ha11b drbdmanaged[4631]: ERROR  cannot 
write to configuration file '/var/lib/drbd.d/drbdmanage_ha02_mysql.res' or 
'/var/lib/drbd.d/drbdmanage_global_common.conf', error returned by the OS is: 
No such file or directory

--
Eric Robinson

> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Eric Robinson
> Sent: Wednesday, July 12, 2017 11:06 AM
> To: Roland Kammerer <roland.kamme...@linbit.com>; drbd-
> u...@lists.linbit.com
> Subject: Re: [DRBD-user] Confusion about DRBD 9
> 
> [This sender failed our fraud detection checks and may not be who they
> appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]
> 
> Thanks so much for your reply!
> 
> > > 1.   When you use drbdmanage to configure a cluster, does it
> > > create configuration files somewhere? I don't see my resources
> > > mentioned in the files under /etc/drbd.d or /var/lib/drbd.d.
> >
> > Should be in /var/lib/drbd.d/drbdmanage_.res
> 
> There is no directory by that name. There is only /var/lib/drbd, which
> contains...
> 
> -rw-r--r-- 1 root root 34 Jul 11 11:32 drbd-minor-0.lkbd
> -rw-r--r-- 1 root root 34 Jul 11 11:32 drbd-minor-1.lkbd
> 
> ..and also /etc/drbd.d, which contains...
> 
> -rw-r--r-- 1 root root  684 Jul 11 11:33 drbdctrl.res
> -rw-r--r-- 1 root root  541 Mar  8  2016 drbdctrl.res_template
> -rw-r--r-- 1 root root  211 Mar  8  2016 drbdmanage-resources.res
> -rw-r--r-- 1 root root 2062 Mar 27 07:37 global_common.conf
> 
> > > 3.   drbdmanage -list-assignments shows failed or pending actions.
> >
> > That requires log files to make assumptions.
> 
> Which log files should I copy and paste from?
> 
> > > 4.   What's the point of using drbdmanage if I'm going to have to
> > > do the old-style manual configuration anyway?
> >
> > You should not have to.
> 
> That's what I thought, but I'm clearly missing something.
> 
> --
> Eric Robinson
> 
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


  1   2   >