Re: [DRBD-user] Is it free?

2019-08-07 Thread Julien Escario
Hello,
Sorry, I won't answer to any of your questions (as I'm not using DRBD
for Kubernetes) but to anwser to your first question : Yes, DRBD/Linstor
is completely free and a really nice piece of software.

But as it's free, we'll have to learn most caveats (and there is) by
yourself, debug by yourself and after that, if possible, contribute.

So if you're askign for really precise questions, sometimes, someone
from Linbit is kind enough to answer or is interested as your concern
could also be a bug, but most of the time, you'll have to test and debug
(your setup) by yourself or wait for the community to give you advices.

That's how the game is played for open and/or free software.

Julien

P.S. : I tried to setup DRBD over wireguard and I confirm you'll have
far less througput and more latency (and CPU usage) than with
'classical' IP. It could be enough, depends on your needs.

Oh, and remember : if your setup is correct, all reads operations are
made for your LOCAL storage. Only writes are replicated (and rely highly
on latency for protocols B & C). Hint : there's not use case for protocol B.

Le 07/08/2019 à 16:01, Vito Botta a écrit :
> Hi Roland! Thanks for your reply! I didn't get a notification, I just noticed 
> it while checking the archives in the browser. So I hope this email updates 
> the thread (never used this so not sure how it works :D).
> 
> Thanks for the answers! I did more tests with replication and also changed 
> the set up to use thin pools for thin provisioning and snapshots. All seems 
> to work great.
> 
> When I was setting it up again in Kubernetes, I found that the CSI yaml on 
> Github references images for the 0.7.0 version not yet available on Quay, so 
> I had to use 0.6.4 for now.
> 
> A few more questions if you don't mind:
> 
> - How do I handle upgrades of the Kubernetes components? Is it enough to just 
> apply the new version of the yaml manifests or do I need to do something else?
> 
> - What about upgrades to Linstor itself? Do I just upgrade packages when new 
> ones are available in the apt repo or do I need to do something else? Will 
> upgrading cause disruption to volumes in use? I am scared that an apt 
> dist-upgrade may screw things up with storage, cause split brains or whatever.
> 
> - From reading the docs I am not sure of what is the default "anti 
> split-brain" configuration. Does Linstor recover by itself by default? 
> 
> - I am trying to use Linstor on top of a Wireguard VPN, since I am looking to 
> encrypt all the traffic between the nodes  - my provider Hetzner Cloud has 
> private networking but somehow they recommend using encryption to protect 
> sensitive data. When using Wireguard, benchmarks in a replicated setup show 
> 1/3 lower write speed compared to running without VPN, at 40 to ~66 MB/sec or 
> so. Is this OK for a good replication and for use with databases (MySQL) or 
> is it too slow? Without VPN I was getting 100 MB/sec or more...
> 
> Thanks!
> Vito
> 
> ___
> Star us on GITHUB: https://github.com/LINBIT
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 



signature.asc
Description: OpenPGP digital signature
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Linstor/DRBD9 : Initial sync stuck at 30Mbps

2019-05-06 Thread Julien Escario
Hello,
Sorry if this as already been answered : I check the archives and found
similar issues with DRBD8 but none with linstor and DRBD9.

This is not really a 'problem' but more a config/performance misconfig :
when I 'move' a ressource from another storage backend to linstor
storage on Proxmox, the sync is maxed at 3MB/s (completely flat).

Drbdtop reports this :
Sent: total:23.8MiB Per/Sec:3.0MiB

Let me first confirm my hardware is (hopefully) capable of doing far
more (10Gbps network and full SSD ZFS storage with nvme cache).

So it seems I made a misconf somewhere.

I tried to change a few values :
# linstor controller drbd-options   --max-buffers=36864
--rcvbuf-size=2097152   --sndbuf-size=1048576
# linstor controller drbd-options   --c-fill-target=10240
--c-max-rate=737280   --c-min-rate=20480   --c-plan-ahead=10

That was described as optimal for 10Gbps network on some howtos I found.

Just in case it wasn't applied on the fly, I ran drbdadm adjust on the
resource (both nodes).

Values are stored in /var/lib/linstor.d/linstor_common.conf file.

No speed change in this sync.

If I run 2 similar syncs at the same time, each of them is stuck at 3MB/s.

What did I miss ?

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Display linstor's backing device

2019-01-07 Thread Julien Escario
Hello,
We're using ZFSThin as backend for our Linstor cluster. Nothing fancy.

I'm trying to set up a backup with ZFS snapshots and zfs send/receive.

First : is this a bad idea ? I know DRBD integrate his own snapshot
system (LVM only ?) but it can't be exported to a non-DRBD system as-is
AFAIK.

So, to automate the backup, I have to write a script that check which
ressources are primary on the host (and that the host is not a diskless
client) and get the ZFS backedn device for a resource.

So, main question : is there a linstor command to get the 'grepable'
backend device for a resource on a host ?

Without having to grep into a file the /var/lib/linstor.d ? (prone to
errors).

I tried some lisntor commands without success. Perhaps with a command
from drbd-utils ?

Thanks for your advice(s),
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] proxmox-linstor API version

2019-01-07 Thread Julien Escario
Helloo,
I'm trying to find (and why not correct) this warning :
Plugin "PVE::Storage::Custom::LINSTORPlugin" is implementing an older
storage API, an upgrade is recommended


I found the reason in PVE/Storage.pm :
use constant APIVER => 2;

And in /usr/share/perl5/PVE/Storage/Custom/LINSTORPlugin.pm :
my $APIVER = 1;

The system compare those vars and complain about difference (but it's
still compatible, dunno for how long).

I doubt there's more to change than setting $APIVER to 2.

Any idea how to find what should be changed in API calls in order to
work on a patch ?

I can't even find any changelog of Proxmox's Storage API changes from V1
to V2.

Best regards,
Julien Escario

P.S. : we're getting a ton of mail at each backup task.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Still kernel crashes (ZFS or DRBD ?)

2018-12-12 Thread Julien Escario
Hello,
Yesterday and today, I experienced a strange crash when live migrating a
VM inside a Proxmox cluster from a diskless node to another node (with
disk attached).

I'm using ZFSThin as backend.

You'll find below the kernel error message I've been able to catch
before everything goes wrong.

I'm not that comfortable with reading such errors. It seems it crashed
on the zfs_range_lock called by spl_kmem_alloc so it seems more a ZFS bug.

It happened just after I moved disk from a NFS storage to Linstor
storage (online). I don't know if both storage nodes both had the
complete dataset. It's perhaps the problem : when you move as primary on
a node that's not completely sync.

Can it be an explanation ?

Any idea to guide me through the resolution ?

Feel free to ask details if I'm not clear, I'm still trying to complete
my analysis.

Thanks a lot,
Julien

> Dec 12 19:22:18 vm13 kernel: [92347.195898] BUG: unable to handle kernel 
> paging request at c0559fce
> Dec 12 19:22:18 vm13 kernel: [92347.195950] IP: avl_insert+0x4b/0xd0 [zavl]
> Dec 12 19:22:18 vm13 kernel: [92347.195973] PGD 1d84e0e067 P4D 1d84e0e067 PUD 
> 1d84e10067 PMD 3f63f00067 PTE 3f6da55061
> Dec 12 19:22:18 vm13 kernel: [92347.196011] Oops: 0003 [#1] SMP PTI
> Dec 12 19:22:18 vm13 kernel: [92347.196023] Modules linked in: veth tcp_diag 
> inet_diag binfmt_misc drbd_transport_tcp(O) ebtable_filter ebtables 
> ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter 
> ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_physdev nf_conntrack_ipv4 
> nf_defrag_ipv4 xt_comment xt_tcpudp xt_addrtype xt_conntrack nf_conntrack 
> xt_set xt_mark ip_set_hash_net ip_set xt_multiport iptable_filter 8021q garp 
> mrp softdog nfnetlink_log nfnetlink nls_iso8859_1 vhost_net vhost tap ib_iser 
> rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi 
> scsi_transport_iscsi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp 
> coretemp kvm_intel ipmi_ssif kvm irqbypass crct10dif_pclmul crc32_pclmul 
> ghash_clmulni_intel pcbc aesni_intel aes_x86_64 zfs(PO) crypto_simd 
> glue_helper cryptd zunicode(PO) zavl(PO)
> Dec 12 19:22:18 vm13 kernel: [92347.196265]  intel_cstate icp(PO) snd_pcm 
> snd_timer intel_rapl_perf snd ast soundcore ttm pcspkr drm_kms_helper joydev 
> input_leds drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt 
> lpc_ich mei_me mei wmi shpchp ioatdma ipmi_si acpi_power_meter acpi_pad 
> mac_hid zcommon(PO) znvpair(PO) spl(O) drbd(O) libcrc32c ipmi_devintf sunrpc 
> ipmi_msghandler ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq 
> hid_generic usbkbd usbmouse usbhid hid i2c_i801 igb(O) ahci libahci ixgbe dca 
> ptp pps_core mdio
> Dec 12 19:22:18 vm13 kernel: [92347.196428] CPU: 11 PID: 10103 Comm: 
> drbd_r_vm-145-d Tainted: P   O 4.15.18-9-pve #1
> Dec 12 19:22:18 vm13 kernel: [92347.196458] Hardware name: Supermicro Super 
> Server/X10SRW-F, BIOS 3.1 06/06/2018
> Dec 12 19:22:18 vm13 kernel: [92347.196485] RIP: 0010:avl_insert+0x4b/0xd0 
> [zavl]
> Dec 12 19:22:18 vm13 kernel: [92347.196499] RSP: 0018:aedaad6dbc40 
> EFLAGS: 00010282
> Dec 12 19:22:18 vm13 kernel: [92347.196519] RAX:  RBX: 
> 8ea096f74900 RCX: c0559fcf
> Dec 12 19:22:18 vm13 kernel: [92347.196541] RDX:  RSI: 
> 8ea096f74908 RDI: 8e9e1a7ac560
> Dec 12 19:22:18 vm13 kernel: [92347.196562] RBP: aedaad6dbc90 R08: 
> c0559fce R09: 8ea23e807180
> Dec 12 19:22:18 vm13 kernel: [92347.196597] R10: 8ea096f74900 R11: 
>  R12: 8e9e1a7ac530
> Dec 12 19:22:18 vm13 kernel: [92347.196628] R13: 8ea096f74200 R14: 
>  R15: 
> Dec 12 19:22:18 vm13 kernel: [92347.196654] FS:  () 
> GS:8ea23f0c() knlGS:
> Dec 12 19:22:18 vm13 kernel: [92347.196701] CS:  0010 DS:  ES:  CR0: 
> 80050033
> Dec 12 19:22:18 vm13 kernel: [92347.196736] CR2: c0559fce CR3: 
> 001d84e0a001 CR4: 003626e0
> Dec 12 19:22:18 vm13 kernel: [92347.196758] DR0:  DR1: 
>  DR2: 
> Dec 12 19:22:18 vm13 kernel: [92347.196779] DR3:  DR6: 
> fffe0ff0 DR7: 0400
> Dec 12 19:22:18 vm13 kernel: [92347.196812] Call Trace:
> Dec 12 19:22:18 vm13 kernel: [92347.196872]  ? zfs_range_lock+0x4bf/0x5c0 
> [zfs]
> Dec 12 19:22:18 vm13 kernel: [92347.196893]  ? spl_kmem_alloc+0xae/0x1a0 [spl]
> Dec 12 19:22:18 vm13 kernel: [92347.196939]  zvol_request+0x16e/0x300 [zfs]
> Dec 12 19:22:18 vm13 kernel: [92347.197879]  generic_make_request+0x123/0x2f0
> Dec 12 19:22:18 vm13 kernel: [92347.198751]  submit_bio+0x73/0x140
> Dec 12 19:22:18 vm13 kernel: [92347.199616]  ? submit_bio+0x73/0x140
> Dec 12 19:22:18 vm13 kernel: [92347.200475]  ? 
> drbd_flush_after_epoch+0x119/0x360 [drbd]
> Dec 12 19:22:18 vm13 kernel: [92347.201652]  
> drbd_flush_after_epoch+0x1ae/0x360 [drbd]
> Dec 12 19:22

Re: [DRBD-user] linstor-proxmox broken after upgrading to PVE/libpve-storage-perl/stable 5.0-32

2018-12-10 Thread Julien Escario
Le 27/11/2018 à 18:08, Yannis Milios a écrit :
> Upgraded to linstor-proxmox (3.0.2-3) and seems to be working well with
> libpve-storage-perl  (5.0-32).
> There's a warning notification during live migrates about the upgraded
> storage API, but at the end the process is completed successfully..
> 
> "Plugin "PVE::Storage::Custom::LINSTORPlugin" is implementing an older
> storage API, an upgrade is recommended"

Hello,
Yannis, did you managed to get rid of this warning ? Same thing here
since last upgrade.

Nothing really bad happening except this annoying warning ...

Best regards,
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Proxmox5.2-12 + latest linstor drbdadm --config-to-exclude error

2018-12-03 Thread Julien Escario
Le 03/12/2018 à 09:47, Roland Kammerer a écrit :
> On Fri, Nov 30, 2018 at 12:46:08PM +0300, Max O.Kipytkov wrote:
>>
>> The external command sent the follwing error information:
>> drbdadm: unrecognized option '--config-to-exclude'
>> try 'drbdadm help'
> 
> Did you load a DRBD9 kernel module or an older 8.4 one that is shipped
> with you kernel? (cat /proc/drbd). Load a DRBD9 one.

I just had the same eror a few days ago : your loaded drbd kernel is the
default one included in pve-kernel package.

Simply :
rmmod drbd
modprobe drbd

And your setup will be fine. Just reboot in order to be sure the correct
module is loaded at boot.

Still, I don't really understand why the kernel persist to load the 8.4
module, even after installing drbd-dkms module and multiple restart.

Best regards,
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Linstor node type & backup

2018-11-30 Thread Julien Escario
Hello,
I can't really find useful information about this in docs : what's the
difference between SATELLITE node and COMBINED node ?

In a recent lab, I deployed 2 nodes that are both in SATELLITE type.
And one node is also running the controller.

Everything runs perfectly (I have to specify --controllers on one node
but that seems normal).

So, what's the use for the COMBINED type ? Does it replicate the
/var/lib/linstor/ directory ?

And just to know : so far, I'm backuping the
/var/lib/linstor/linstordb.mv.db file. Is it sufficient to restart
another controller if the first one is down ?

Thanks !
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Linstor-server 0.7.3/linstor-client 0.7.2 release

2018-11-30 Thread Julien Escario
Le 30/11/2018 à 10:43, Roland Kammerer a écrit :
> On Fri, Nov 30, 2018 at 10:30:05AM +0100, Julien Escario wrote:
>> Ok, nervermind :
>> # cat /proc/drbd
>> version: 8.4.10 (api:1/proto:86-101)
>> srcversion: 17A0C3A0AF9492ED4B9A418
> 
> There will be a check in drbd-utils in the next release, and one in
> LINSTOR as well. We know it is a pretty frequent problem and that error
> logging so far is not really sufficient.

Fun fact :
When running 'drbdadm -h' with 8.4 module, there's no
--config-to-exclude option in the output.

When running same command with 9.6 module, the option 'magically' appears !

Which means drbdadm check module version at runtime and adapt its output
accordingly, nice !
But this leads to misunderstanding of the reason why this option wasn't
available ;-)

Best regards,
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Linstor-server 0.7.3/linstor-client 0.7.2 release

2018-11-30 Thread Julien Escario
Le 30/11/2018 à 10:35, Roland Kammerer a écrit :
> On Fri, Nov 30, 2018 at 10:27:10AM +0100, Julien Escario wrote:
>>> Any chances you are using your distributions DRBD8.4 module instead of
>>> the DRBD9 one you should use?
>>
>> I know it's a frequent answer on this list but I don't think so :
>> # drbdadm --version
> 
> Oh, I do think so :)
> 
>> DRBD_KERNEL_VERSION_CODE=0x08040a
> ^^ here we go
> 
> cat /proc/drbd

My apologies.
I don't really understand why drbd8.4 module was loaded (from pve kernel
package) instead of drbd-dkms (aka 9) module ...

Just ran dpkg-reconfigure drbd-dkms and rebooted servers to check if
correct version is loaded at boot time.

Seems fine this time. I must really find time to understand underlying
dkms module system.

Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Linstor-server 0.7.3/linstor-client 0.7.2 release

2018-11-30 Thread Julien Escario
Ok, nervermind :
# cat /proc/drbd
version: 8.4.10 (api:1/proto:86-101)
srcversion: 17A0C3A0AF9492ED4B9A418

Sorry,
Julien

Le 30/11/2018 à 10:27, Julien Escario a écrit :
>> Any chances you are using your distributions DRBD8.4 module instead of
>> the DRBD9 one you should use?
> 
> I know it's a frequent answer on this list but I don't think so :
> # drbdadm --version
> DRBDADM_BUILDTAG=GIT-hash:\ d458166f5f4740625e5ff215f62366aca60ca37b\
> build\ by\ @buildsystem.linbit\,\ 2018-10-29\ 12:27:38
> DRBDADM_API_VERSION=1
> DRBD_KERNEL_VERSION_CODE=0x08040a
> DRBDADM_VERSION_CODE=0x090600
> DRBDADM_VERSION=9.6.0
> 
> # cat /etc/apt/sources.list.d/linbit.list
> deb http://packages.linbit.com/proxmox/ proxmox-5 drbd-9.0
> 
> # dpkg -l linstor*
> ii  linstor-client  0.7.2-1
> ii  linstor-common  0.7.3-1
> ii  linstor-controller  0.7.3-1
> ii  linstor-proxmox 3.0.2-3
> ii  linstor-satellite   0.7.3-1
> 
> And for drbd* (only installed packages) :
> ii  drbd-dkms   9.0.16-1
> ii  drbd-utils  9.6.0-1
> ii  drbdtop 0.2.1-1
> 
> Double checked versions and all seems pretty uptodate (installed yesterday).
> 
> Thanks for your help,
> Julien Escario
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Linstor-server 0.7.3/linstor-client 0.7.2 release

2018-11-30 Thread Julien Escario
> Any chances you are using your distributions DRBD8.4 module instead of
> the DRBD9 one you should use?

I know it's a frequent answer on this list but I don't think so :
# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ d458166f5f4740625e5ff215f62366aca60ca37b\
build\ by\ @buildsystem.linbit\,\ 2018-10-29\ 12:27:38
DRBDADM_API_VERSION=1
DRBD_KERNEL_VERSION_CODE=0x08040a
DRBDADM_VERSION_CODE=0x090600
DRBDADM_VERSION=9.6.0

# cat /etc/apt/sources.list.d/linbit.list
deb http://packages.linbit.com/proxmox/ proxmox-5 drbd-9.0

# dpkg -l linstor*
ii  linstor-client  0.7.2-1
ii  linstor-common  0.7.3-1
ii  linstor-controller  0.7.3-1
ii  linstor-proxmox 3.0.2-3
ii  linstor-satellite   0.7.3-1

And for drbd* (only installed packages) :
ii  drbd-dkms   9.0.16-1
ii  drbd-utils  9.6.0-1
ii  drbdtop 0.2.1-1

Double checked versions and all seems pretty uptodate (installed yesterday).

Thanks for your help,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Linstor-server 0.7.3/linstor-client 0.7.2 release

2018-11-30 Thread Julien Escario
Le 22/11/2018 à 16:07, Rene Peinthor a écrit :
> Hi Everyone!

Hello,

> We finished another linstor release, with this comes a new snapshot
> rollback command and usual fixes.
> The snapshot rollback command is used to do in-place rollbacks to the
> last snapshot taken.
> 
> As always the release highlights:
> 
> linstor-server 0.7.3

Just tried a new 2-nodes cluster with this version and on ressource
creation (from PVE but I doubt there's a relation), it fails with error
(useful part from log) :

Additional information:
The full command line executed was:
drbdadm --config-to-test /var/lib/linstor.d/vm-3000-disk-1.res_tmp
--config-to-exclude /var/lib/linstor.d/vm-3000-disk-1.res sh-nop

The external command sent the following output data:


The external command sent the follwing error information:
drbdadm: unrecognized option '--config-to-exclude'
try 'drbdadm help'



It seems drbdadm does not have a '--config-to-exclude' option. Anything
I missed ?

I can confirm, same error is throwed when using command line :

# linstor resource create vm12 backups --storage-pool pool_vm

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] .drbdctrl and resource roles

2018-11-09 Thread Julien Escario
Le 09/11/2018 à 11:20, Roland Kammerer a écrit :
> On Thu, Nov 08, 2018 at 09:58:28PM +0200, Daniel Hertanu wrote:
>> I would like to understand why, on server2, .drbdctrl role is secondary and
>> res1 role is primary, while on server3 .drbdctrl role is primary and res1
>> role is secondary.
> 
> Why not? A resource is Primary on that host that "uses" it. Auto-promote
> with DRBD9 and an open(), or set to Primary.
> 
>> I can switch res1 primary role between server2 and server3 without problems
>> but I can't do anything about .drbdctrl. Trying to change the role to
>> secondary on server3 doesn't return any error but it's just not happening.
> 
> a) don't touch the control-volume. Drbdmanage, and only drbdmanage is
> responsible to switch it Primary on a node. You do not do that manually,
> DM selects one. You can stop all nodes, manually switch it to Primary on
> your favorite host and then start all cluster nodes. Then you
> "preselected" a leader node.

May I add that's it's already a strange state as AFAIK when no
management task is executed BOTH .drbdctrl should be in secondary state.
It only becomes primary when drbdmanage change 'something' (move disk,
reconfigure, etc ...).

> b) forget everything in a), remove drbdmanage, and install LINSTOR.

You'll save you some time.

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] linstor-proxmox-3.0.0-rc1

2018-10-05 Thread Julien Escario
Le 05/10/2018 à 11:06, Roland Kammerer a écrit :
> Dear Proxmox users,
> 
> There will be a new release soon. Actually this would have been it, but
> hey, why rush a new release on a Friday when you don't have to :-). So
> let's call this an rc1.
> 
> Notable changes:
> - Multi-Pool support

One word : WOW !

> - Fix a race between LINSTOR thinks a resource is ready vs. it being
>   actually usable. In rare cases this bit people when for example
>   converting from local storage to DRBD, and Proxmox started with its
>   "dd" too early even though the device node was not usable.
> - Many clean ups. Now it looks like real Perl (is that even a good
>   thing?). Thanks Lars! And as I can't read that Perl anymore, I hope he
>   also maintains it from now on. Do you? Do you? :-)

Please, stop trolling about Perl. You should already know that Perl7
will fix all those typos misunderstandings ;-)

> Please test if you can, if I don't hear any complaints, that will be the
> final version released early next week.

Me ! me ! I have one : we can't grow a resource from the interface (web
or cmdline).

It seems that the resource is correctly growed but proxmox core isn't
correctly informed about it.
More details here :
http://lists.linbit.com/pipermail/drbd-user/2018-September/024503.html

Do you want more feedback ? I can reserve one hour to reproduce this.

Thanks a lot,
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdadm down failed (-12) - blocked by drbd_submit

2018-10-02 Thread Julien Escario
Le 02/10/2018 à 19:56, Radoslaw Garbacz a écrit :
> Hi,
> 
> 
> I have a problem, which (from what I found) has been discussed, however not in
> the particular case, which I experienced, so I would be grateful for any
> suggestions of how to deal with it.

Your problem sounds pretty similar to a recent experience we had BUT with DRBD9
(and drbdmanage).

You can try to force disconnect by firwalling ports 7789/7790 with iptables on
nodes and try to force socket disconnect with http://killcx.sourceforge.net/.

I didn't manage to get rid of this situation without rebooting a node.

I was told to upgrade the drbd kernel module.

Good luck !
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Linstor-server v0.6.5, Linstor-client 0.6.2 release

2018-10-02 Thread Julien Escario
Le 02/10/2018 à 11:31, Rene Peinthor a écrit :
> Hi Everyone!
> 
> This mostly a bugfix release, one change that needs mentioning is that
> all delete commands will now wait until the resource is actually deleted on 
> the
> satellites.

Great, thank you !

> linstor-server 0.6.5
> 
>  * Fix: Thin drivers often didn't correctly skip the initial sync

Quickly tested on my lab and I can confirm this item : initial sync is now
correctly skipped.

Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Linstor-server v0.6.5, Linstor-client 0.6.2 release

2018-10-02 Thread Julien Escario
Le 02/10/2018 à 11:31, Rene Peinthor a écrit :
> Hi Everyone!
> 
> This mostly a bugfix release, one change that needs mentioning is that
> all delete commands will now wait until the resource is actually deleted on 
> the
> satellites.

Great, thank you !

> linstor-server 0.6.5
> 
>  * Fix: Thin drivers often didn't correctly skip the initial sync

Quickly tested on my lab and I can confirm this item : initial sync is now
correctly skipped.

Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdmanage pure client write speed

2018-09-25 Thread Julien Escario
Le 24/09/2018 à 16:36, Brice CHAPPE a écrit :
> Hi mailing !
> 
>  
> 
> I have three nodes drbdmanage cluster.
> 
> Two nodes work as storage backend (S1/S2).
> 
> One node as satellite pure client (for future nova usage)
> 
> I work on 20GB/s LACP network between storage backends and satellite pure 
> client
> node
> 
>  
> 
> So, when I bench on local drbd on storage node with two nodes connected with :
> 
> dd if=/dev/zero of=/dev/drbd104 bs=1M count=512 oflag=direct
> 
> I have almost 680 MB/s => It is ok for me
> 
>  
> 
> After I assign the resource to the satellite node.
> 
> I try the same thing on it :
> 
> dd if=/dev/zero of=/dev/drbd104 bs=1M count=512 oflag=direct
> 
> I get 420MB/s => why ?

You're experiencing latency ;-)
Check using scripts here :
https://docs.linbit.com/docs/users-guide-9.0/#p-performance

With a little maths, you'll understand what's happening (consider your 1M block
size and retry with smaller I/O).

Best regards,
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] linstor-proxmox : online grow of resources

2018-09-25 Thread Julien Escario
Le 24/09/2018 à 13:19, Robert Altnoeder a écrit :
> On 09/24/2018 01:03 PM, Julien Escario wrote:
>> Hello,
>> When trying to resize disk (aka grow only) on Proxmox interface for a
>> linstor-backed device, this error is thrown :
>> VM 2000 qmp command 'block_resize' failed - Cannot grow device files (500)
>>
>> BUT resource is effectively growed in linstor and out of sync datas are 
>> synced.
>> drbdpool/vm-2000-disk-1_0  27,2G  6,91T  27,2G  -
> 
> You can check the size of the DRBD device, /dev/drbd1003 according to
> information below, to ensure that the size change was completed by the
> LINSTOR & DRBD layers. If the size of the DRBD device has also been
> updated, then the problem is somewhere outside of LINSTOR & DRBD.

I really don't think it's a problem with Linstor or DRBD but more specific to
linstor-proxmox plugin.

For an unkown reason, the plugin return an error code to Proxmox backend when
resizing and Proxmox config isn't update with new size and doesn't inform the VM
of the change (there should be a KVM-specific call for this).

For example, with VM running :


# qm resize 2000 virtio1 +5G
SUCCESS:
Description:
Volume definition with number '0' of resource definition 'vm-2000-disk-2'
modified.
Details:
Volume definition with number '0' of resource definition 'vm-2000-disk-2'
UUID is: dea1ca6b-a2af-445a-8005-65a12974779e
VM 2000 qmp command 'block_resize' failed - Cannot grow device files


% fdisk -l /dev/vdb
Disk /dev/vdb: 26 GiB, 27917287424 bytes, 54525952 sectors


# fdisk -l /dev/drbd1003
Disk /dev/drbd1003 : 31 GiB, 33285996544 octets, 65011712 sectors

# qm rescan
rescan volumes...
VM 2000: update disk 'virtio1' information.

But still the same size inside VM. I can't how Proxmox inform the VM of the size
change.


>> At last remark, still for the same resource, ZFS shows much larger volume :
>> ╭──╮
>> ┊ ResourceName   ┊ VolumeNr ┊ VolumeMinor ┊ Size   ┊ State ┊
>> ╞┄┄╡
>> ┊ vm-2000-disk-2 ┊ 0┊ 1003┊ 26 GiB ┊ ok┊
>> ╰──╯
>>
>> # zfs list drbdpool/vm-2000-disk-2_0
>> NAMEUSED  AVAIL  REFER  MOUNTPOINT
>> drbdpool/vm-2000-disk-2_0  41,6G  6,87T  41,6G  -
>>
>> This is just after full resync (resource delete/create on this node).
>>
>> 41GB used for a 26GB volume isn't a bit much ?
>> Using zpool history, I can find the used line for this resource :
>> 2018-09-24.12:14:43 zfs create -s -V 27268840KB drbdpool/vm-2000-disk-2_0
> 
> 27,268,840 kiB is consistent with a 26 GiB DRBD 9 device for 8 peers, so
> the reason for the effective size of ~42 GiB would probably be outside
> of LINSTOR, unless there was some resize operation in progress that did
> not finish.

Probably, yes. I'll investigate on the ZFS side.

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] linstor-proxmox : online grow of resources

2018-09-24 Thread Julien Escario
Hello,
When trying to resize disk (aka grow only) on Proxmox interface for a
linstor-backed device, this error is thrown :
VM 2000 qmp command 'block_resize' failed - Cannot grow device files (500)

BUT resource is effectively growed in linstor and out of sync datas are synced.
drbdpool/vm-2000-disk-1_0  27,2G  6,91T  27,2G  -

VM is never informed of the resize and Proxmox interface still display old size.

Simply rebooting the VM make it aware of new size, even if Proxmox interface
still display old size.

Offline grown (when VM is stopped) is working perfectly fine, even if resource
was resized online previously (10G creation + 5G online + 5G offline shows 20G
in interface).

At last remark, still for the same resource, ZFS shows much larger volume :
╭──╮
┊ ResourceName   ┊ VolumeNr ┊ VolumeMinor ┊ Size   ┊ State ┊
╞┄┄╡
┊ vm-2000-disk-2 ┊ 0┊ 1003┊ 26 GiB ┊ ok┊
╰──╯

# zfs list drbdpool/vm-2000-disk-2_0
NAMEUSED  AVAIL  REFER  MOUNTPOINT
drbdpool/vm-2000-disk-2_0  41,6G  6,87T  41,6G  -

This is just after full resync (resource delete/create on this node).

41GB used for a 26GB volume isn't a bit much ?
Using zpool history, I can find the used line for this resource :
2018-09-24.12:14:43 zfs create -s -V 27268840KB drbdpool/vm-2000-disk-2_0

= 26GB.

But perhaps is it more realted to ZFS configuration (ashift for example).

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Linstor-server 0.6.3, Linstor-client 0.6.1 release

2018-09-24 Thread Julien Escario
Le 24/09/2018 à 10:33, Rene Peinthor a écrit :
> Thanks for reporting, this is a bug in the thin driver.
> It wasn't probably marked as a thin driver and will still do full resync, will
> be fixed in the next version.

Nice to hear !
Anything of my side I can do to try to fix this and/or confirm bug ? (force thin
mark somewhere ?)

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Linstor-server 0.6.3, Linstor-client 0.6.1 release

2018-09-24 Thread Julien Escario
Le 24/09/2018 à 10:12, Robert Altnoeder a écrit :
> On 09/24/2018 09:43 AM, Julien Escario wrote:
>> Did I miss something or the ZFSthin storage plugin is only for thin
>> for the creation node ?
> 
> Is the resource using thin provisioning on all of the nodes?
> 
> Mixing fat provisioning on some nodes and thin provisioning on other
> nodes for the same volume of a resource is not supported.

Seems to :
╭─╮
┊ StoragePool ┊ Node┊ Driver┊ PoolName ┊ FreeCapacity ┊
TotalCapacity ┊ SupportsSnapshots ┊
╞┄╡
┊ drbdpool┊ nodeA   ┊ ZfsThinDriver ┊ drbdpool ┊ 6.93 TiB ┊  9.06
TiB ┊ true  ┊
┊ drbdpool┊ nodeB   ┊ ZfsThinDriver ┊ drbdpool ┊ 6.93 TiB ┊  9.06
TiB ┊ true  ┊
╰─╯

Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Linstor-server 0.6.3, Linstor-client 0.6.1 release

2018-09-24 Thread Julien Escario
Le 24/09/2018 à 07:42, Rene Peinthor a écrit :
> It is not possible to delete a storage pool that is still in use by some
> resources/volumes.

Thanks !
I removed all ressource then delete storage-pool and recreate it as zfsthin.

linstor storage-pool create zfsthin nodeA drbdpool drbdpool

Everything went fine but when I create a ressource, the initial sync still takes
happens.

Did I miss something or the ZFSthin storage plugin is only for thin for the
creation node ?
Because, as-this, thin provision isn't really used on secondary node(s).

Best regards,
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Linstor-server 0.6.3, Linstor-client 0.6.1 release

2018-09-21 Thread Julien Escario
Le 11/09/2018 à 15:55, Rene Peinthor a écrit :
> On Tue, Sep 11, 2018 at 3:04 PM Julien Escario  <mailto:julien.esca...@altinea.fr>> wrote:
> 
> Le 10/09/2018 à 15:52, Rene Peinthor a écrit :
> And one question : is there a way to 'convert' an existing storage-pool 
> from zfs
> to zfsthin ?
> 
> 
> No userfriendly way, it might be possible with 1 or 2 direct database queries.

Hello,
Just about to test another method : what will happen if I run
# linstor storage-pool delete drbdpool nodeA
# linstor storage-pool create nodeA drbdpool zfsthinpool drbdpool

Will this delete my ressources ? Or simply refuse to operate due to ressourcex
existence ?

Thanks a lot,
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Linstor-server 0.6.3, Linstor-client 0.6.1 release

2018-09-11 Thread Julien Escario
Le 10/09/2018 à 15:52, Rene Peinthor a écrit :
> Hi All!

Hello !

> After the two hot fix releases from last week, we have another release with
> mainly fixes, but not so dramatically and a new feature with supporting ZFS 
> thin
> storage pools.
> 
> ZFS thin storage pool use the same syntax as all the other storage pool 
> drivers
> e.g.:
> linstor storage-pool create zfsthin node zfsthinpool tank

Great !
One remark : perhaps can you update/close
https://github.com/LINBIT/linstor-server/issues/1 ?

And one question : is there a way to 'convert' an existing storage-pool from zfs
to zfsthin ?

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdmanage multiple VGs not supported

2018-09-04 Thread Julien Escario
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Le 03/09/2018 à 11:57, abdollah karimnia a écrit :
> Dear all,
> 
> Is there any way to replicate  different VGs using drbdmanage? Currently we
> can add only one VG name (drbdpool) into /etc/drbdmanaged.cfg file. Seems
> that it is not possible to have different VGs to replicate.

Short concat of answers : don't do that and move on to Linstor.

Julien
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJbjkATAAoJEOWv2Do/MctuFe8QAJ8FIE+9tasOkcl0XnYbMYQf
AqYUAGvt5oTaL6qIpzxUUVqnJNTEkT9Y+CATv9Sqw1orKZb3wZyg+gIJSmX53B9M
4K0Y8k5gyud5IsmdspMXK15m8wkFzyGaW7S5jT3w/vUiSStzOinjFZf0mF3Dm6Nw
k/N0PwixSxzg572mBs1vZZCA6gJ8ozz3P0xucrkZ+T3JkgcF6tg5E/1uoEwQLzaK
J+nlq30Zcd+Gp5sYJ0G24QcdZjbeVahm+AWCHFzJCk1/iRrHh1c0LwshvkWQwXOn
g5uYodDs5DkT/mJV85rm0Gq5r8ZZUT995FcPyfqiGV4zu8Juw2rZOaod0Q/JvkI6
P13dzxhB46XV93Y5grDHhavRb61qATvKh5oc7eEt6VovEJKV868mposDNibuVfvk
KcA+2GkhejBIg4KlNhQ6NF/DF1CVqOF8cSUzowGUUD4mipt7xxqvWbOi8vK0GMt/
0tdCf07XdZHKFAI/LFR5qEyBGX86XXSjGZnbBypELpOq8VoSOd6Jtksa1LypKCNU
zqoL/aKSHev/O0xdn/ngz0BpX4W2U88eI6L1kh4OHy1rpyG9ZZpXBZqIGiHje5Lc
NPHVtLt5/NDGSdn2VhQIVBkJsmrh/3kFlv06UemG4BZuk3Dos4J7awDqTuT6at1E
UztXmqnvh0k1KYFwzaHv
=EB1+
-END PGP SIGNATURE-
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Resource is 'Blocked: upper'

2018-08-29 Thread Julien Escario
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Le 29/08/2018 à 12:00, Lars Ellenberg a écrit :
> Something that was fixed in April I think. You want to upgrade to 9.0.15 
> (or whatever is "latest" at the time someone else finds this in the 
> archives...)
> 
> Unfortunately you will have to reboot, likely more or less "hard".

Hum, thanks Lars. I never expected to race a condition of bug in DRBD : I'm
reading most of changelogs at each release and it always seems to be bugs in
very rare conditions.

Good to know but I think I'll reboot the machine, move everything out of it,
reinstall latest versions of everything and migrate this cluster to linstor.

Thanks again,
Julien
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJbhra1AAoJEOWv2Do/MctueJUP/AvsgvA7S92QdE83zCTp8Hx9
oxPrcPIAQiOBuSr5TXYcrqn4ePY9I/lI8KUfDAqAwdvnGDyR5k9mgLuGMcpUrlf0
EvbCiIgFeIQJ+x+NnxaZXLKvTxefXKjs7oNhgBMd1TVIC8jG8OOGpIspBmm0nYwn
CVx9NGVoqZwPFU3HrOOiVT+XeOhsy+Uch/VsnB7HVbKdYA0mUeJGeL39snvarr1b
5XhEg00o+pknVYtyNO/WsmsqYB5Rkgu0j8DcGESuVJ188rAbjLbQl2fiFLhpLbp9
L6GFviVY3YGEvD1l3FM0IFgzf3AzlZWHU6S81jDAZli4NHc+/Q6sKiuq/CPv5EsW
apGCR+Si636fyW0peNpvIvsQa98urttj6PL7tomv3qgZLo54N0kWxY8oajKBoZc3
arTLIbp5dbQ2xaLYQVIpLW8isHf+sAsVl4a3tCL0hPjAW+C2Lu6K0GIOwHLGtb6j
gIlzTfqGU9jMemykho3RFMq+b5OmIy5NiTrnI9sOom7Jrbye1QXr8n6cTIGkWVXh
xKNKg6IGU1kbGoEzbOH4oIYvII4UpcJaydsJQVQ+exrS8Wz+qxMvjRtiuRm8C9Uf
E5jwBRWJitrh5tA8C3ptvGqpJHh7QDWg/AcZyHYeAPuT8ptXj3hTFL20QbaaRCDf
C8MOjHm8NDWsWVL9lBbn
=fJHQ
-END PGP SIGNATURE-
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Any way to jump over initial sync ?

2018-08-29 Thread Julien Escario
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Many many thanks for the detailled procedure.

I'll try in a few days with drbd9 and will let you know if something has to be
changed (mainly because resources are created on the fly on each side).

Julien

Le 29/08/2018 à 14:32, David Bruzos a écrit :
> Hi, I have lots of experience skipping the initial sync with ZFS zvols and 
> drbd 8.4.x.  I have been using the "drbdadm -- --clear-bitmap 
> new-current-uuid " for years and never had a problem.  This is 
> why it works:
> 
> 1. New ZFS volumes are guaranteed to return only zero data for unwritten 
> blocks, so two new volumes are always in sync, if they have not been 
> written to. 2. Also, if you have a VM base image on two hosts, you can 
> clone the volumes in the image on each host and also skip the initial
> sync, because both clones will be identical.  Of course, I am asuming that
> the base image was replicated via ZFS streams. 3. LVM volumes and most
> other volume types will not work well, because they don't guarantee new
> volumes to be zero-filled.  However, depending on your use case, it is
> often better to zero-fill your volumes manually (E.G. cat /dev/zero
> >/dev/vg/vol0) and skip the sync.  It does not seem reasonable, but given
> the storage and network characteristics at play, it could be much much
> better than doing an actual DRBD sync. 4. If in doubt, run a drbd verify
> until you feel confidence in your process.
> 
> * Typical way to skip the sync (at least this is my proven method):
> 
> # drbdadm create-md   -  (do this on both nodes) # drbdadm up 
>   -  (do this on both nodes) # drbdadm -- --clear-bitmap 
> new-current-uuid   -  (do this on secondary node) # drbdadm 
> primary   -  (do this on primary node) # cat /proc/drbd  - 
> (Enjoy!)
> 
> In my opinion, having to replicate multi-tb volumes is an incredible waste 
> of time and resources, if it can be safely avoided.  I've talked to many 
> people that patiently wait while their giant 4 TB VM volumes do their 
> initial sync and hog their environment's I/O in the process...
> 
> I hope this helps you and others out there who are looking for a better 
> way...
> 
> David
> 
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJbhrW0AAoJEOWv2Do/Mctu8TMP/3DYV8A+azXZLeq1AfRk/VE0
Db5rJkgjeNRvCIFdpr49m+bcFlrA/++Evlf/TdDnWH03Rq0rg8L44oEX9dN3aAqg
9QJpcPduXYaWpvBAOx3dD2RLpKhiHPUWFyMLBEXV9+X4fK8RbvXSMj/hTt9xQs1C
Zf801bn5hlwl3m0o0kloWh2XdkAteG3t9E5WbUvb+6B+Qkn+/VkWtJ/gRKIQZv8H
3mvO/5kV3YcIaoP6vmVvKo37fH2A7Sj3AfpIUsOUCrLrqoW6vJbMXnW7SjfbkhVg
vVlMNRRxDmfwGHqYQHUJDgOMJexVBZXxRUY7xcUJ4eSZ3HG/pG8r8Us9RBLp39SS
DKDABmVhjbtHMQMXiLCBYyXj74hHfEON0SSW6UHWb+UVBbMyZrf93KJvRBD56dfv
LspkZrb3mQ0q4TImG/MbTswMs0RraJUZqbmj9WALyPYTGTV/ORsE+g0KMV8gnTXM
4u4go9z5VQoDTah+aTcE3wpzdyDstd5i5p1CSMSeW/g9u1V5r2zKO6r39V4YRu14
Zoyv6J55y3+vmH25jqvIBJUkOaDsDCAlWWwE9pfyFPKQc/MmArGUWMuitKD0cgiP
vTSaJ4hm+xFFiXgjLUNxvCyMVSgCmhLVnccsjiL1LTCvCav3QscfYZ6K6dBtIdxA
NrmvY3d5zGDw7MW8VzvU
=J/C6
-END PGP SIGNATURE-
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Any way to jump over initial sync ?

2018-08-28 Thread Julien Escario
Hello,
Just wanted to know : is there a way to get rid of initial sync with linstor and
zfs backend ?
Right now, I have a 1TB volume to create and initial sync is vry long.

I think it's mostly due to unavailability of thinly provisionned ZFS resources
but perhaps is there a way to suspend resync and ask the system to simply
consider both resources as sync'ed ?

Best regards,
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] linstor-proxmox-2.9.0

2018-08-28 Thread Julien Escario
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Le 24/08/2018 à 10:54, Roland Kammerer a écrit :
> Dear Proxmox VE users,
> 
> we released version 2.9.0 of the linstor-proxmox plugin.

Hello,
Just to let you know : today, I tried to deploy linstor-proxmox on an
ipv6-only server. Sadly, packages.linbit.com isn't ipv6 capable so I had to
create a small (fake) reverse proxy for this host.


Not really a big concern but ... we're in 2018 ;-)

If someone want to install packages from packages.linbit.com from an ipv6-only
server, feel free to ask me the access (it's just a matter of putting one line
in /etc/hosts) but BEWARE you WILL have to check packages signatures because I
ther's no way to guarantee that packages are not corrupted ones.

Regards,
Julien
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJbhUauAAoJEOWv2Do/MctuBsYP/3gNz7pu1NDC7XTUK0J1CwAm
WkRrTUp9Ccc4fv8iduR9Z9x64oCDk2xny7JMUYkYjV7FcfwilUl3YoiX2WBaqr/H
56lv+asTZ2wU3LgMUcU+t/RQRGWrhTWlAGrLwI0MWRpzngw78H/SNAHrFZA7Cpgt
XWAxcVbD4ovXNhd/GF3I4BbXCRTkwJ7AAVe7D6zLfB4Z4MHdu1K/iTYjgwE91zIK
TE2d54QTymv1KfKxVozJTSTl60hqBFSaRHyImHbs/wzA3ElG+R+RfsG1GMPg7i+/
GRNenl6fN5UynvJKGimvfQcrzj15VvWb3kaNP+Z+FsHVC4IhHLpeJqXJpk9Sb+91
bOBSMz1Wl48v7dl5Y53jnvnoQoYpkCSs+zuihG4PCjfNj5I78/yjIYrx7ZFVl8nc
760twZI046kwNYi3nOSc1Lb0a4IdxjRvvCWGuKvsfyZr9P3tvJ5RI7IpHVIMZKdO
idr5fJcLQA+zIfeC72ef8a1mbA9UVNyBg/avFFsJSF4E19sN7Kjb5l0I7obsVIhG
GphliybXiTkqK4+L/vNZ+sfKVZJUhVRyGlVYReFmxFQkVc0k28JSh2mJhMX6PK9L
Hb0YAczU4qAkDHYj0jFNmQ8KCkA4/e5aYTYgILlYM8js74/IBRcctg6K8vNEiRdD
3dmZyC+0Gy5J4nBVbOJf
=J5ru
-END PGP SIGNATURE-
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] linstor-satellite in 'Connected' state

2018-08-27 Thread Julien Escario
Ok, just forget it : just a matter of NOT running mtu 9000 on switch interfaces
with mtu 1500.

*ashamed*

Julien

Le 27/08/2018 à 20:04, Julien Escario a écrit :
> Hello,
> I'm continuing side lab with linstor.
> 
> I recently moved connection between two nodes to a VLAN interface. Same 
> adresses
> for each were moved to the new VLAN.
> 
> Both nodes were restarted (dunno remember the order).
> 
> One node is Controller+ satellite (dedie83) and other is satellite only 
> (dedie82).
> 
> Now, on satellite only, I'm getting this log line almost every second :
> dedie82 Satellite[11196]: 19:54:54.566 [MainWorkerPool-17] INFO
> LINSTOR/Satellite - Controller connected and authenticated
> 
> Nothing more (
> 
> And :
> # linstor node list
> ╭──╮
> ┊ Node┊ NodeType  ┊ IPs┊ State ┊
> ╞┄┄╡
> ┊ dedie82 ┊ SATELLITE ┊ 10.10.201.1(PLAIN) ┊ Connected ┊
> ┊ dedie83 ┊ SATELLITE ┊ 10.10.201.2(PLAIN) ┊ Online┊
> ╰──╯
> 
> Resources aren't coming up (without surprise).
> 
> systemctl restart linstor-satellite on dedie82 doesn't resolve the status.
> 
> Does it has something to do with the fact I moved to a VLAN interface between
> nodes ?
> 
> Off course, I could delete the whole cluster and start back but debugging this
> case would be more useful to understand Linstor internals.
> 
> Best regards,
> Julien
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] linstor-satellite in 'Connected' state

2018-08-27 Thread Julien Escario
Hello,
I'm continuing side lab with linstor.

I recently moved connection between two nodes to a VLAN interface. Same adresses
for each were moved to the new VLAN.

Both nodes were restarted (dunno remember the order).

One node is Controller+ satellite (dedie83) and other is satellite only 
(dedie82).

Now, on satellite only, I'm getting this log line almost every second :
dedie82 Satellite[11196]: 19:54:54.566 [MainWorkerPool-17] INFO
LINSTOR/Satellite - Controller connected and authenticated

Nothing more (

And :
# linstor node list
╭──╮
┊ Node┊ NodeType  ┊ IPs┊ State ┊
╞┄┄╡
┊ dedie82 ┊ SATELLITE ┊ 10.10.201.1(PLAIN) ┊ Connected ┊
┊ dedie83 ┊ SATELLITE ┊ 10.10.201.2(PLAIN) ┊ Online┊
╰──╯

Resources aren't coming up (without surprise).

systemctl restart linstor-satellite on dedie82 doesn't resolve the status.

Does it has something to do with the fact I moved to a VLAN interface between
nodes ?

Off course, I could delete the whole cluster and start back but debugging this
case would be more useful to understand Linstor internals.

Best regards,
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Resource is 'Blocked: upper'

2018-08-27 Thread Julien Escario
Le 27/08/2018 à 18:15, Julien Escario a écrit :
> Le 27/08/2018 à 17:44, Lars Ellenberg a écrit :
>> On Mon, Aug 27, 2018 at 05:01:52PM +0200, Julien Escario wrote:
>>> Hello,
>>> We're stuck in a strange situation. One of our ressources is marked as :
>>> volume 0 (/dev/drbd155): UpToDate(normal disk state) Blocked: upper
>>>
>>> I used drbdtop to get this info because drbdadm hangs.
>>>
>>> I can also see a drbdsetup process blocked :
>>> drbdsetup disk-options 155 --set-defaults --read-balancing=prefer-local
>>> --al-extents=6481 --al-updates=no --md-flushes=no
>>
>> maybe check /proc//stack to see where it blocks.
>> Also, what DRBD (kernel) version is this?
> 
> # cat /proc/drbd
> version: 9.0.12-1 (api:2/proto:86-112)
> GIT-hash: 7eb4aef4abbfba8ebb1afbcc30574df74db0063e build by root@vm9, 
> 2018-03-27
> 15:55:44
> Transports (api:16): tcp (9.0.12-1)
> 
> 
> # cat /proc/22538/stack
> [] drbd_al_shrink+0xd7/0x1a0 [drbd]
> [] drbd_adm_disk_opts+0x2b2/0x580 [drbd]
> [] genl_family_rcv_msg+0x203/0x3f0
> [] genl_rcv_msg+0x4c/0x90
> [] netlink_rcv_skb+0xec/0x120
> [] genl_rcv+0x28/0x40
> [] netlink_unicast+0x192/0x230
> [] netlink_sendmsg+0x2d2/0x3c0
> [] sock_sendmsg+0x3e/0x50
> [] sock_write_iter+0x85/0xf0
> [] new_sync_write+0xe7/0x140
> [] __vfs_write+0x29/0x40
> [] vfs_write+0xb5/0x1a0
> [] SyS_write+0x55/0xc0
> [] entry_SYSCALL_64_fastpath+0x24/0xab
> [] 0x

Just got another interesting one, perhaps more at the root of the problem :
Process is named [drbd_r_vm-102-d]. So kernel receiver for vm-102 resource.

his stack file is :
# cat /proc/12024/stack
[] conn_disconnect+0x667/0x7f0 [drbd]
[] drbd_receiver+0x1cc/0x690 [drbd]
[] drbd_thread_setup+0x70/0x160 [drbd]
[] kthread+0x10c/0x140
[] ret_from_fork+0x35/0x40
[] 0x

What is this processus waiting for ? No clue actually.

I'm trying to find a way to 'unblock' this process (by giving it what he waits
for) but perhaps is it impossible.

I need more in-depth kernel knowledge.

Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Resource is 'Blocked: upper'

2018-08-27 Thread Julien Escario
Le 27/08/2018 à 17:44, Lars Ellenberg a écrit :
> On Mon, Aug 27, 2018 at 05:01:52PM +0200, Julien Escario wrote:
>> Hello,
>> We're stuck in a strange situation. One of our ressources is marked as :
>> volume 0 (/dev/drbd155): UpToDate(normal disk state) Blocked: upper
>>
>> I used drbdtop to get this info because drbdadm hangs.
>>
>> I can also see a drbdsetup process blocked :
>> drbdsetup disk-options 155 --set-defaults --read-balancing=prefer-local
>> --al-extents=6481 --al-updates=no --md-flushes=no
> 
> maybe check /proc//stack to see where it blocks.
> Also, what DRBD (kernel) version is this?

# cat /proc/drbd
version: 9.0.12-1 (api:2/proto:86-112)
GIT-hash: 7eb4aef4abbfba8ebb1afbcc30574df74db0063e build by root@vm9, 2018-03-27
15:55:44
Transports (api:16): tcp (9.0.12-1)


# cat /proc/22538/stack
[] drbd_al_shrink+0xd7/0x1a0 [drbd]
[] drbd_adm_disk_opts+0x2b2/0x580 [drbd]
[] genl_family_rcv_msg+0x203/0x3f0
[] genl_rcv_msg+0x4c/0x90
[] netlink_rcv_skb+0xec/0x120
[] genl_rcv+0x28/0x40
[] netlink_unicast+0x192/0x230
[] netlink_sendmsg+0x2d2/0x3c0
[] sock_sendmsg+0x3e/0x50
[] sock_write_iter+0x85/0xf0
[] new_sync_write+0xe7/0x140
[] __vfs_write+0x29/0x40
[] vfs_write+0xb5/0x1a0
[] SyS_write+0x55/0xc0
[] entry_SYSCALL_64_fastpath+0x24/0xab
[] 0x

Doesn't really talk to me ... vfs, network ?

Best regards,
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Resource is 'Blocked: upper'

2018-08-27 Thread Julien Escario
Hello,
We're stuck in a strange situation. One of our ressources is marked as :
volume 0 (/dev/drbd155): UpToDate(normal disk state) Blocked: upper

I used drbdtop to get this info because drbdadm hangs.

I can also see a drbdsetup process blocked :
drbdsetup disk-options 155 --set-defaults --read-balancing=prefer-local
--al-extents=6481 --al-updates=no --md-flushes=no

And disks are ok. Some other resources wroking 'fine' are stored on the same ZFS
pool.

zpool status returns all disk as fine also.

But, here, the problem is that (probably) due to this drbdsetup process,
.drbdctrl volume is held as primary.
We rebooted the other node but it can't configure his ressources because (I
think), he can't manage to get lock on .drbdctrl volume.

I know, it's drbdmanage, old technology, etc ... but I would like to know how I
can get rid of this 'blocked' state without reboot.
I've the premonition that it could be useful with linstor too in a near future.

Any low level command to reset this state and shutdown the resource to unblock
the whole think ?

Best regards,
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] First Linstor bug encountered

2018-08-21 Thread Julien Escario


Le 21/08/2018 à 18:39, Robert Altnoeder a écrit :
> On 08/21/2018 06:23 PM, Julien Escario wrote:
>> Hello,
>> Just hit a bug after multiple creation/deletion of resources on my two nodes
>> cluster.
>>
>> Syslog reports :
>>
>> Aug 21 17:31:28 dedie83 Satellite[15917]: 17:31:28.828 [MainWorkerPool_0016]
>> ERROR LINSTOR/Satellite - Problem of type 'java.lang.NullPointerException'
>> logged to report number 5B770066-00
>> Aug 21 17:31:28 dedie83 Satellite[15917]: 17:31:28.833 [MainWorkerPool_0016]
>> ERROR LINSTOR/Satellite - Access to deleted resource [Report number
>> 5B770066-01]
> 
> In that case, we'd certainly be interested in getting the
> /opt/linstor-server/logs/ErrorReport-5B770066-01.log file.
> Looks like a satellite attempted to work with some data of a resource
> that it had declared deleted before.

Right but it would be useful to understand why this happened.

Here is the REAL error log I think (some obscure Java error ;-):

https://framabin.org/p/?645b441b2abeefd1#disCSMbiaa1NAiFwTLr4iXZ414h2bjMlfmGA/MdMP3k=

And the one you asked for (Access to deleted resource) :

https://framabin.org/p/?772e59e80450a5fa#nl4PG6/tkx2yjTUDXA9AsUxls18TkAgCr8Ee76Y5ja8=

> For recovery, disconnecting/reconnecting the Satellite (or just
> restarting it) should suffice. It should normally also retry the
> resource deletion afterwards.

Right, restarting linstor-satellite reset everything in the right state.

I had to issue :
systemctl restart linstor-satellite

but on BOTH nodes.

> I noticed the resource is in the "Primary" role. It might still be in
> use, and in that case, LINSTOR can not successfully delete the resource,
> because the layers below LINSTOR do not support that.

Resource was in primary on the surviving node. I tried to delete it on secondary
node.
Tried to delete a primary resource failed with 'Resource is mounted/in use.'.
That's the correct behavior I think ;-)

Even if there as been a little glitch in the force, cluster is now back in a
correct state for me and without rebooting, that's far better than drbdmanage's
dbus crashes ;-)

Best regards,
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] First Linstor bug encountered

2018-08-21 Thread Julien Escario
Hello,
Just hit a bug after multiple creation/deletion of resources on my two nodes
cluster.

Syslog reports :
Aug 21 17:31:28 dedie83 kernel: [350254.337961] drbd vm-102-disk-5 dedie82:
Preparing remote state change 63686478
Aug 21 17:31:28 dedie83 kernel: [350254.338166] drbd vm-102-disk-5 dedie82:
Committing remote state change 63686478 (primary_nodes=1)
Aug 21 17:31:28 dedie83 kernel: [350254.338170] drbd vm-102-disk-5 dedie82:
conn( Connected -> TearDown ) peer( Secondary -> Unknown )
Aug 21 17:31:28 dedie83 kernel: [350254.338172] drbd vm-102-disk-5/0 drbd1009
dedie82: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Aug 21 17:31:28 dedie83 kernel: [350254.338185] drbd vm-102-disk-5 dedie82:
ack_receiver terminated
Aug 21 17:31:28 dedie83 kernel: [350254.338187] drbd vm-102-disk-5 dedie82:
Terminating ack_recv thread
Aug 21 17:31:28 dedie83 kernel: [350254.338471] drbd vm-102-disk-5/0 drbd1009:
new current UUID: 6E533D317A5115E9 weak: FFFE
Aug 21 17:31:28 dedie83 Satellite[15917]: 17:31:28.828 [MainWorkerPool_0016]
ERROR LINSTOR/Satellite - Problem of type 'java.lang.NullPointerException'
logged to report number 5B770066-00
Aug 21 17:31:28 dedie83 Satellite[15917]: 17:31:28.833 [MainWorkerPool_0016]
ERROR LINSTOR/Satellite - Access to deleted resource [Report number
5B770066-01]
Aug 21 17:31:28 dedie83 kernel: [350254.398545] drbd vm-102-disk-5 dedie82:
Connection closed
Aug 21 17:31:28 dedie83 kernel: [350254.398570] drbd vm-102-disk-5 dedie82:
conn( TearDown -> Unconnected )
Aug 21 17:31:28 dedie83 kernel: [350254.398575] drbd vm-102-disk-5 dedie82:
Restarting receiver thread
Aug 21 17:31:28 dedie83 kernel: [350254.398577] drbd vm-102-disk-5 dedie82:
conn( Unconnected -> Connecting )


Command that triggered this :
linstor resource delete dedie82 vm-102-disk-5

VM102 resource is stuck :
vm-102-disk-5 role:Primary
  disk:UpToDate
  dedie82 connection:Connecting

I was able to issue drbdadm disconnect vm-102-disk-5, so right now, state is
StandAlone.

There was some load at this time. I was optimizing nvme speed (so
delete/create a few times to compare with connection and without).

All other ressources are fine.

drbdadm adjust put the resource is Connecting state.

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] First test lab with drbd9 + linstor + Proxmox

2018-08-21 Thread Julien Escario
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Le 20/08/2018 à 18:21, Robert Altnoeder a écrit :
> On 08/20/2018 05:03 PM, Julien Escario wrote:
>> My question was essentially about thinprov with ZFS. In recent versions 
>> (2018), drbdmanage is able to skip full resync at volume creation with 
>> ZvolThinLv2 plugin at least).
> 
> I don't remember the exact status right now, it is planned, but I think it
> is not implemented yet. It might even work already, but I am pretty sure
> it's not tested yet.

It seems you already answered that :
https://github.com/LINBIT/linstor-server/issues/1

Sorry for double-asking. Let's wait for update of the issue on github.

> Anyway, developer resources are limited, so as of now, we still have a 
> rather typical command line client (it's actually a heavily adapted variant
> of the drbdmanage client). Using this client, people apparently had a hard
> time figuring out which objects they could create and where to start, and
> so the command line client was restructured to work with multiple levels of
> subcommands, where the first level is an object, not an action. This was
> done in order to make the list of top-level commands shorter.
> 
> That's how you get "storage-pool create" instead, and sometimes there are
> leftovers in the documentation from before this change. We'll fix that in
> the documentation.

We really do not have a big concern here. Just to let you know.
It's just about having to learn a new command's syntax and internals, nothing
more. It happens almost every day in our kind of jobs.

>> Another one spotted right now : "6.5. Makinging the Controller
>> Highly-Availible" Should be 'making', I think.
> 
> Yes, and while we're at it, let's make this "Available". Looks like we got
> 2 typos on one line here, must have been one of those days... ;)

Erf, I just copied the second typo without seeing it. Nice catch !

>> So I think equivalent of assign/unassign of drbdmanage is :
>> 
>> # linstor resource delete   --storage-pool  #
>> linstor resource create   --storage-pool 
> 
> Essentially yes, I think delete does not take the --storage-pool parameter
> though.

True. Unuseful parameter.
Let's try to move a ressource from a storage pool to another inside a node to
see if we can manually place resources as proxmox-linstor can't at the moment.

Thanks for all those details.

As always, I'll try to give you as much feedback as possible (OSS state of mind)
.

Julien
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJbe9HfAAoJEOWv2Do/Mctu/C4QAKXClbtVJnicz2sGO5v+E/Aq
2tTwVHgLjYMbmMEJ6LDvsHOnVR7NY45DuZsovWL8WGfqUMznwSW7LSgNVJmkGYkr
pzMiBen1devcL1/5WWXGl29WDWBjB3UOMUtN1iShQN5KGBm/MvBuGnO1CQ0V0nKB
rHnIu5U6UWUHUDFlNmMv05FocbFAhJlwrrUWzyafjOrlpL+35kzQJZx/1arj036d
mJYHwIW0RZmFa74pr9VS4LogkppMBqEurkwbnsBRXBPD8DWKER88Ro0B21pljruE
+J/SCoxDC7KnNOGq2eljekh9JH56RUWk9cJQxI9wo7m1u0KTWg2OcSz3cMCdMXvX
vwK1yLS8CZ22rnC7HHO6cjrG6/4rs4fEUdo3/DHmTFgZLWgc7KOzqMOyiB3TOI2m
SjiJtwCNsfvZXHZJV/rjrWUWmJsFA5kkC7CA+vAY5R6Ie+DaUbd2xQOhCN2teQVT
VwsHxJu72+5VuGkHeT2Gza0jMSO5U4ntYKmZu0wQjDs10O3rJyIIkWlcuA6rmy8C
7stjR6rERk/bUrsH3H+Izj0UMeedgy6mKCzQA/Dap03TVYG7DGwKQ5BrnCJcYkcc
SRo9D0LfVrPPNzWEBGLX5kx9dmwpIBcVOcctWofJPuQbiFNRkPq66AuY13aKCBNh
MLaG72qook3S55bN9lL7
=BWAj
-END PGP SIGNATURE-
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] First test lab with drbd9 + linstor + Proxmox

2018-08-20 Thread Julien Escario
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Le 20/08/2018 à 16:43, Roland Kammerer a écrit :
>> I'm still missing a few things like ZFS Thin provisionning (When I create
>> a new disk, a full resync is initiated). Did I miss something ? Is it
>> planned ?
> 
> You used a LVM pool, so yes, you get a full sync. Use a LVM-thinpool and 
> the full sync gets skipped.

My question was essentially about thinprov with ZFS. In recent versions
(2018), drbdmanage is able to skip full resync at volume creation with
ZvolThinLv2 plugin at least).

>> Also, in my test, when creating multiple resources simultaneously, I
>> ended up with a disconnect resource at creation (probably because network
>> hasn't been reacheable in time due to interface saturation) :
>> 
>> vm-101-disk-5 ┊ dedie82 ┊ 7010 ┊  Unknown
>> 
>> vm-101-disk-5 role:Secondary disk:UpToDate dedie82 connection:Connecting
>> 
>> 
>> Is there already some kind of disconnect/reconnect command with linstore
>> ? (I didn't manage to found it).
> 
> No, if something needs fixing on the lower level, you then use lower level
> commands (drbdadm/drbdsetup). But this should not happen in the first
> place. You could try to restart the linstor-satellite service and check if
> it recovers.

Got my answer with in-depth linstor commands search :

# linstor resource create dedie82 vm-101-disk-5 --storage-pool pool_hybrid

This forced creation of ressource on second node.
So I think equivalent of assign/unassign of drbdmanage is :

# linstor resource delete   --storage-pool 
# linstor resource create   --storage-pool 

That should do the trick to achieve "data rebalancing" with linstor.

Thanks for your help,
Julien
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJbetg5AAoJEOWv2Do/Mctuh1sP/0Rc/Xs8u3lPdrs6QAKk/m2u
2nsCjsYq4OmGiC5MK75RGM1TKUCI9i4LaInLWHJ/BknWMbo6N2lpKJMy3yzanQ3R
SrcdGdW7U6GWe6wTuPHAeajonIiXidIs8v1XJj6jbzitTWwf67l1yyXQNVLSRSUh
3NQu+cxhzrVyzkT9+FaEYfcZwh4aVeUd3KbBEozuvU9iVaKQu93rmjQ+W0JUfeKh
6/d45xa340WB7Vlwu36rsVkxxHB2inWwp3HypVTVebfS1bZm8bHNLRQm0ItfG48K
kOQAyG1iA6997LAJ/sk3hqV92JZdKCvMCUxWzJMz6FG1HTljLFXUY3TT91hRsQDo
WAK7/G/1SVNhf8LZFOGIiy7hBUVrZDrmQturqJ9WgRDONfDR4heA8gXg/ioS1dv7
a/623BDl22+nmUCp54i98H69yw/X1/2aJ6gP8C3KX+Dlcj8F4nmCopQa6a4401ip
2I0wuGPBN84I+vv0BDHTaIGV4LeC/Gr3Q69znHLkTahlNdupy94NJKm9lgflFqjB
axtn084wfksJW365sUcTsXmyEVNWKI+y9dwSWlq0nQCs2r7lOiqle5eJGiffjtSg
Mg2GEwG0n6rL7O1pmPF8OX5CKyfS3Noum0hyWwBqjX8RciVK82M39DBS6nNiXGJC
3+eQq5zNO3wFFZ8duWiB
=vlLp
-END PGP SIGNATURE-
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] First test lab with drbd9 + linstor + Proxmox

2018-08-20 Thread Julien Escario
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Hello,
I recently started my first lab to test linstor in replacement to drbdmanage.

My lab env :
2 Proxmox nodes with 128GB RAM, 5 2TB HDD and 2 Optane 900P
ZFS backend storage (for use with Optane as cache)
'Only' 1Gbps network card (probably the future bottleneck for sync)

It seems to be a really great piece of software. Adding a node was really easy
compared to drbdmanage.
Also, not having to deploy 2 LVM ressources for drbdctrl volume is a plus
because I'm using btrfs for the OS storage. So, no LVM at all.

Installation went fine. Just a little typo in manual, chapter 4.8 (Storage
pools) :
linstor create-storage-pool pool_ssd alpha lvm vg_ssd

The correct syntax seems to be :
linstor storage-pool create alpha pool_ssd lvm vg_ssd

Another one spotted right now : "6.5. Makinging the Controller Highly-Availible"
Should be 'making', I think.

I'm still missing a few things like ZFS Thin provisionning (When I create a
new disk, a full resync is initiated). Did I miss something ? Is it planned ?

Also, in my test, when creating multiple resources simultaneously, I ended up
with a disconnect resource at creation (probably because network hasn't been
reacheable in time due to interface saturation) :

vm-101-disk-5 ┊ dedie82 ┊ 7010 ┊  Unknown

vm-101-disk-5 role:Secondary
  disk:UpToDate
  dedie82 connection:Connecting


Is there already some kind of disconnect/reconnect command with linstore ? (I
didn't manage to found it).

Best regards,
Julien Escario
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJbes8GAAoJEOWv2Do/Mctuca0QAIRGjtTa98X+/c9X2rXZ3tYI
tePljDeKuxLsGNiuZUghEhdqKtIfunLl52/Cl3m2NMvzK59gb2VBGC3Jq7vm34M4
oUIH5RzMidOSyvyVj85JtSdtiq5pMKfMOQwyGLLrZAdAbzxzKSpnAxDIDFVcNsyt
OlEPe3gC1/dv2WPmCjDF6AB/0ImK6jBzFdnwZH/yqNd2Jep8bFE9wu671vHZhPrJ
9tjMJfCq/EoGRq5hVM/INrboderNCuuzF7QdZQVUT34uMF+JIb9KEuXG3FhX0OlX
knsYpuviuYHJX1KEka2w5SrA9Pw26MxAK+0+QMhuX8euPrt7MnxCwYfAghMo574X
mWtv8w9uwNe3Q/VqW0uDU7uXw+vdWH7mnxKJRBPrVOwzFsyVdl9V7xUoTDOWi5vV
CpXsUBZx0VKEYaLxDvDRR4wFdRh1M8ulZusu9FUXA0DxQfQu1Y0UznoA791gut0W
XKzmpRaAqla2LN4LhdeJn9RKdLirNr4z91lmy/itP0dBKZnmQ872YvuiC1iv27S3
9F3AjOv7wNemEtHj2kzNayhxQdAIIRZIYU7odH+1Qqbg3qJdJioVxymIRcyqeFf4
J4mwSt8ycOufFJtS1Eb2RoZvhqZ0DQ+9Sr28EPhxErN9ntrMyl3wCQIEAssFr50N
w3q26YZUbA+MH0Jm9SyQ
=w33d
-END PGP SIGNATURE-
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] linstor-proxmox-2.8

2018-07-26 Thread Julien Escario
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Le 26/07/2018 à 15:28, Roland Kammerer a écrit :
> Dear Proxmox VE users,
> 
> we released the first version of the linstor-proxmox plugin. This 
> integrates LINSTOR (the successor of DRBDManage) into Proxmox.
> 
> It contains all the features the drbdmanage-proxmox plugin had (i.e., 
> creating/deleting volumes with a configurable redundancy, VM 
> live-migration, resizing, snapshots (limited as by the design differences
> between DRBD and Proxmox).

Well, that's great.

> The missing feature is proper size reporting. Reporting meaningful values
> for thinly allocated storage is a TODO in LINSTOR itself, so currently the
> plugin reports 8 out of 10TB as free storage. Always. Besides that it
> should be complete.

Huh ? Not certain I understand : 8 out 10 (aka 80% ?) or ... why 10TB ?

> The brave find the source code here (be careful, it is Perl, your eyes may
> start bleeding ;-) ): https://github.com/LINBIT/linstor-proxmox

Wow, You're going a bit too far Roland ;-) Perl's not the horror you describe
but just an undead language. This will change with Perl 7, you'll see !
(sarcasm inside).

Julien
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJbWeTTAAoJEOWv2Do/Mctu9jgP/RBnAb1ipPbvvgh49jTais2A
07y57V8PJbYEVfca72dLWnyAat2zwvGBnK4NZV7f537J1c75Y4B6LDZ0OE8kGPL0
aqbMH6K/OtEKNiCARwbZfDuxtiLD63ZIMBBiU/LcimXGGvBbBSQ+ZNrVt4jz9t/9
OLX6bn2I9iFvHPiDGoKfYcJyHsNmQgewqIRo6okIRyWvSeSQWWf1GXAtmF8vgUb/
xMsEHPsMnahjqaFn8ZtYEVz+dZTMQ4ORFvJOhtEDqgJhyBIbT4FVeCTtt+/uLsx8
nzF7OILMti0D57322zakk3sEOPcNceFHhMRDvp3mmcIJ1vXlRHMHj+r6AZXj47BX
6WcHNv+LhVEkeWFHCEUVyU/FdG89ZBY6KYzG/pqKzRLMOVo5E7VE/p4b9tHUz3Z+
Rh8cDwYkXyrnuGGWSD1TMYBs3h4MZ2RzgQik4bwmypm0zf4zAAXwMA53ABRFibC8
n1FonPTNA2dQm5bgx1awMeXaFcRCZpap1Kp4gukqswN8hz3XIP/h36n2aiUqoUW+
AtUBDNVdYVDnPWE5yc854PWx6ahDGd1TpIDiEAvwEFbp245WaNvx/4lXI91e8L4S
lIngaKmuit1rAFBkeGKKIpWLqP2pW2gAP6UdsOggBLZ4Z7ou0zZ2Q4E+O8IY5mE6
odZDaTvMIShLaIj5zjal
=9tJy
-END PGP SIGNATURE-
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD9, Linstor and Proxmox VE

2018-07-18 Thread Julien Escario
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Hello,
May we have an update about Linstor status ? Is it ready for production use ?
Only with Linbit support contract ?

And a more difficult question : do you intend to provide a Linstor plugin for
Proxmox VE ? I know you had some concerns with Proxmox in the past but as far
as I can see, the main user base of DRBD is using it for Proxmox and Openstack
backend storage.

On the Linstor product page, I can read : "OpenStack, OpenNebula, Proxmox (Q4
2018)". Is the support already done for OpenStack and OpenNebula or can I
consider the support for all platforms be ready by the end of the year ?

Best regards and thanks for clarification,
Julien Escario
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJbTwArAAoJEOWv2Do/MctuTsIP/3FZiCDYLu2NzXphZbhyDiJq
bmHPb4BJvEKHpnFxjZX5D05KW5tadf5+H4HmgEs2NJr3MPBoxGTZhsTIbiINS0v+
AUPIE4DSMivz0T/uNqZTHXUxWIDsHFIniHnbXYJiyNK/82pBWJqOCI+XFfhp773K
YXKttGqUswYJGuMquh1p3ntuDBOuSBGd4qXlwhmu49WIgZc+NGuaB3FQS//SGD/y
I5C4xazZTH39X8myhVksyEv7aKBdI2CZ9kh8tUBDvSsqkNmjOS1fDkReuQNHVBdD
FuvJQdQejDldL+vkA6tYRgfMmAIVjeOhcV9jzQNVZK0Tv0GBf5FhrJN3mNYiGcid
8o3+29mGsyywfAJlkvaEP8UFpz9VVPFvSILKFfAK+CU2c4+rXIBqKR6/pVC3Li9Q
Fm3OPzv5TaAMlJe/TuKFixmZnjUeVWAYX2m+ycjhz06b8uiTl8L18sY4BXi8wuG+
YEVlMOS/yXUJHlEVU3QrCpXZgzUX08wMs5jrly18bOdEgQMLyBIIUSYI3HMKTTXV
UzpSup7DRJdF2VVg9cvn3abJu6wmWI9MY4f6JG56X4aMETrGiM+7sh9o4KnRRjUj
3+jff+eJAeSsQ6uSwHKcO/Fg8nG6grU/fNpVzZPTYNlI2EHAeitPuuhZDW8qKaAd
TKUNjv/VwVhmLcZyp//V
=No6f
-END PGP SIGNATURE-
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Randomly crash drbdsetup

2018-06-28 Thread Julien Escario
Hello Roland,
First, thanks for your concern about my question.

Le 28/06/2018 à 08:41, Roland Kammerer a écrit :
> On Wed, Jun 27, 2018 at 12:37:20PM +0200, Julien Escario wrote:
>> Hello,
>> We're experiencing a really strange situation.
>> We often play with :
>> drbdmanage peer-device-options --resource  --c-max-rate 
>>
>> especially when a node crash and need a (full) resync.
>>
>> When doing this, sometimes (after 10 or 20 such commands), we end up with
>> drbdmanage completely stuck and a drbdsetup that seems to block on an IO with
>> returning.
>> For example :
>> drbdsetup disk-options 144 --set-defaults --read-balancing=prefer-local
>> --al-extents=6481 --al-updates=no --md-flushes=no
>>
>> drbdadm status display ressource up to this one then hangs on drbdsetup call.
>>
>> drbdtop is still usable.
>>
>> Right now, we didn't manage to find a solution without rebooting the node 
>> (sadly).
>>
>> Do you experience such situation ?
>> What can cause this ?
> 
> What version of DRBD9 is that (cat /proc/drbd)? drbdsetup hangs for a
> reason, kernel related, not an actual bug in drbdsetup. "dmesg" at that
> time would be interesting. Yes, I saw that, but not recently, only with
> by now pretty old versions of DRBD9.

I posted a full dump of kernel messages here :
https://framabin.org/p/?988ac2e36beabde6#a6UmS1uK/idqlPCCgoIar8oeEcNjRf8kCmdlECPu+V4=

Versions :
cat /proc/drbd
version: 9.0.12-1 (api:2/proto:86-112)
Transports (api:16): tcp (9.0.12-1)

drbd-utils  9.3.0-1
drbdmanage-proxmox  2.1-1

Is that too old ?

Can my problem be caused by two nodes only setup ? Is 3 nodes the required
minimum for correct operation ? (even if I'm aware it's the recommended setup).

>> Is there a way to unblock this process without rebooting ?
> 
> Depends, but as a rule of thumb: when that happens the kernel is already
> in a state where you want to/have to reboot.

We can also see memory errors automatically corrected by ECC :
Jun 28 04:02:37 vm8 kernel: [159134.557190] {24}[Hardware Error]: Hardware error
from APEI Generic Hardware Error Source: 1
Jun 28 04:02:37 vm8 kernel: [159134.557191] {24}[Hardware Error]: It has been
corrected by h/w and requires no further action
Jun 28 04:02:37 vm8 kernel: [159134.557192] {24}[Hardware Error]: event
severity: corrected
Jun 28 04:02:37 vm8 kernel: [159134.557193] {24}[Hardware Error]:  Error 0,
type: corrected
Jun 28 04:02:37 vm8 kernel: [159134.557195] {24}[Hardware Error]:  fru_text:
CorrectedErr
Jun 28 04:02:37 vm8 kernel: [159134.557197] {24}[Hardware Error]:
section_type: memory error
Jun 28 04:02:37 vm8 kernel: [159134.557198] {24}[Hardware Error]:   node: 0
device: 1
Jun 28 04:02:37 vm8 kernel: [159134.557200] {24}[Hardware Error]:   error_type:
2, single-bit ECC

Perhaps related, I don't know. drbdbsetup process did also hangs on 'sane' node.

But a night of memtest (3 complete passes) didn't detect any error.

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdsetup (again) stuck on blocking I/O

2018-06-27 Thread Julien Escario
wow, my mails finally made it to the list ... forget it, it's redondant with
my today's thread.

Julien

Le 22/06/2018 à 14:39, Julien Escario a écrit :
> Hello, DRBD9 is really a great piece of software but from time to time, we
> end stuck in a situation without other solution than reboot.
> 
> For exemple, right now, when we run : # drdbadm status It display some
> ressources than hang on a specific ressource and finally returns "Command
> 'drbdsetup status' did not terminate within 5 seconds".
> 
> And drdsetup processus stacks. drbdmanage is completely out of order on
> both nodes (see below).
> 
> Running drbdsetup status with strace runs until problematic ressource and
> displays :
>> write(3,
>> "4\0\0\0\34\0\1\3\227\251,[\330f\0\0\37\2\0\0\377\377\377\377\0\0\0\0\30\0\2\
0"...,
>> 52) = 52 poll([{fd=1, events=POLLHUP}, {fd=3, events=POLLIN}], 2, 12)
>> = 1 ([{fd=3, revents=POLLIN}]) poll([{fd=3, events=POLLIN}], 1, -1)=
>> 1 ([{fd=3, revents=POLLIN}]) recvmsg(3, {msg_name={sa_family=AF_NETLINK,
>> nl_pid=0, nl_groups=}, msg_namelen=12,
>> msg_iov=[{iov_base=[{{len=720, type=0x1c /* NLMSG_??? */,
>> flags=NLM_F_MULTI, seq=1529653655, pid=26328},
>> "\37\2\0\0\244\0\0\0e\0\0\0 \0\2\0\10\0\1@\0\0\0\0\22\0\2@vm-1"...},
>> {{len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}}],
>> iov_len=8192}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_PEEK) =
>> 720 poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3,
>> revents=POLLIN}]) recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0,
>> nl_groups=}, msg_namelen=12, msg_iov=[{iov_base=[{{len=720,
>> type=0x1c /* NLMSG_??? */, flags=NLM_F_MULTI, seq=1529653655, pid=26328},
>> "\37\2\0\0\244\0\0\0e\0\0\0 \0\2\0\10\0\1@\0\0\0\0\22\0\2@vm-1"...},
>> {{len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}}],
>> iov_len=8192}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 720 
>> poll([{fd=1, events=POLLHUP}, {fd=3, events=POLLIN}], 2, 12) = 1
>> ([{fd=3, revents=POLLIN}]) poll([{fd=3, events=POLLIN}], 1, -1)= 1
>> ([{fd=3, revents=POLLIN}]) recvmsg(3, {msg_name={sa_family=AF_NETLINK,
>> nl_pid=0, nl_groups=}, msg_namelen=12,
>> msg_iov=[{iov_base=[{{len=20, type=NLMSG_DONE, flags=NLM_F_MULTI,
>> seq=1529653655, pid=26328}, "\0\0\0\0"}, {{len=164, type=0x65 /*
>> NLMSG_??? */, flags=0, seq=131104, pid=65544},
>> "\0\0\0\0\22\0\2\0vm-145-disk-1\0\0\0\330\0\3\0(\0\1\0"...}, {{len=6433,
>> type=0x5 /* NLMSG_??? */, flags=NLM_F_DUMP_INTR, seq=0, pid=1114117},
>> "\1\0\0\0\5\0\22\0\1\0\0\0\5\0\23\0\0\0\0\0\10\0\24\0\0\0\0\0\10\0\25\0"...},
>> {{len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}}],
>> iov_len=8192}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_PEEK) =
>> 20 poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3,
>> revents=POLLIN}]) recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0,
>> nl_groups=}, msg_namelen=12, msg_iov=[{iov_base=[{{len=20,
>> type=NLMSG_DONE, flags=NLM_F_MULTI, seq=1529653655, pid=26328},
>> "\0\0\0\0"}, {{len=164, type=0x65 /* NLMSG_??? */, flags=0, seq=131104,
>> pid=65544}, "\0\0\0\0\22\0\2\0vm-145-disk-1\0\0\0\330\0\3\0(\0\1\0"...},
>> {{len=6433, type=0x5 /* NLMSG_??? */, flags=NLM_F_DUMP_INTR, seq=0,
>> pid=1114117},
>> "\1\0\0\0\5\0\22\0\1\0\0\0\5\0\23\0\0\0\0\0\10\0\24\0\0\0\0\0\10\0\25\0"...},
>> {{len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}}],
>> iov_len=8192}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20 
>> write(3, "4\0\0\0\34\0\1\3\230\251,[\330f\0\0
>> \2\0\0\377\377\377\377\0\0\0\0\30\0\2\0"..., 52
> 
> drbdadm runs fine on node 2.
> 
> I don't exactly see how to interpret this.
> 
> Finally, I can see that node 1 is keeping the drbdctrl ressource as primary
> : something must have gone wrong on this node.
> 
> drbdtop actually runs correctly and shows for the problematic ressouce : 
> volume 0 (/dev/drbd164): UpToDate(normal disk state) Blocked: upper and : 
> Connection to node2(Unknown): NetworkFailure(lost connection to node2)
> 
> How can I debug such situation without rebooting node1 ?
> 
> This is not the time we're encountering such situation and rebooting each
> time is really a pain, we're talking of highly available clusters.
> 
> Any other info I can provide ?
> 
> Thanks a lot !
> 
> Best regards, Julien Escario
> 
> P.S. : drbdmanage output
> 
> On node 1 (actual drbdctrl primary) :
> 
> # drbdmanage r ERROR:dbus.proxies:Introspect error on :1.53:/interface: 
> dbus.exceptions.DBusException: org

[DRBD-user] Randomly crash drbdsetup

2018-06-27 Thread Julien Escario
Hello,
We're experiencing a really strange situation.
We often play with :
drbdmanage peer-device-options --resource  --c-max-rate 

especially when a node crash and need a (full) resync.

When doing this, sometimes (after 10 or 20 such commands), we end up with
drbdmanage completely stuck and a drbdsetup that seems to block on an IO with
returning.
For example :
drbdsetup disk-options 144 --set-defaults --read-balancing=prefer-local
--al-extents=6481 --al-updates=no --md-flushes=no

drbdadm status display ressource up to this one then hangs on drbdsetup call.

drbdtop is still usable.

Right now, we didn't manage to find a solution without rebooting the node 
(sadly).

Do you experience such situation ?
What can cause this ?
Is there a way to unblock this process without rebooting ?

Thanks a lot for your help,
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] drbdsetup (again) stuck on blocking I/O

2018-06-24 Thread Julien Escario
Hello,
DRBD9 is really a great piece of software but from time to time, we end stuck in
a situation without other solution than reboot.

For exemple, right now, when we run :
# drdbadm status
It display some ressources than hang on a specific ressource and finally returns
"Command 'drbdsetup status' did not terminate within 5 seconds".

And drdsetup processus stacks. drbdmanage is completely out of order on both
nodes (see below).

Running drbdsetup status with strace runs until problematic ressource and 
displays :
> write(3, 
> "4\0\0\0\34\0\1\3\227\251,[\330f\0\0\37\2\0\0\377\377\377\377\0\0\0\0\30\0\2\0"...,
>  52) = 52
> poll([{fd=1, events=POLLHUP}, {fd=3, events=POLLIN}], 2, 12) = 1 ([{fd=3, 
> revents=POLLIN}])
> poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=}, 
> msg_namelen=12, msg_iov=[{iov_base=[{{len=720, type=0x1c /* NLMSG_??? */, 
> flags=NLM_F_MULTI, seq=1529653655, pid=26328}, "\37\2\0\0\244\0\0\0e\0\0\0 
> \0\2\0\10\0\1@\0\0\0\0\22\0\2@vm-1"...}, {{len=0, type=0 /* NLMSG_??? */, 
> flags=0, seq=0, pid=0}}], iov_len=8192}], msg_iovlen=1, msg_controllen=0, 
> msg_flags=0}, MSG_PEEK) = 720
> poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=}, 
> msg_namelen=12, msg_iov=[{iov_base=[{{len=720, type=0x1c /* NLMSG_??? */, 
> flags=NLM_F_MULTI, seq=1529653655, pid=26328}, "\37\2\0\0\244\0\0\0e\0\0\0 
> \0\2\0\10\0\1@\0\0\0\0\22\0\2@vm-1"...}, {{len=0, type=0 /* NLMSG_??? */, 
> flags=0, seq=0, pid=0}}], iov_len=8192}], msg_iovlen=1, msg_controllen=0, 
> msg_flags=0}, 0) = 720
> poll([{fd=1, events=POLLHUP}, {fd=3, events=POLLIN}], 2, 12) = 1 ([{fd=3, 
> revents=POLLIN}])
> poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=}, 
> msg_namelen=12, msg_iov=[{iov_base=[{{len=20, type=NLMSG_DONE, 
> flags=NLM_F_MULTI, seq=1529653655, pid=26328}, "\0\0\0\0"}, {{len=164, 
> type=0x65 /* NLMSG_??? */, flags=0, seq=131104, pid=65544}, 
> "\0\0\0\0\22\0\2\0vm-145-disk-1\0\0\0\330\0\3\0(\0\1\0"...}, {{len=6433, 
> type=0x5 /* NLMSG_??? */, flags=NLM_F_DUMP_INTR, seq=0, pid=1114117}, 
> "\1\0\0\0\5\0\22\0\1\0\0\0\5\0\23\0\0\0\0\0\10\0\24\0\0\0\0\0\10\0\25\0"...}, 
> {{len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}}], iov_len=8192}], 
> msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_PEEK) = 20
> poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=}, 
> msg_namelen=12, msg_iov=[{iov_base=[{{len=20, type=NLMSG_DONE, 
> flags=NLM_F_MULTI, seq=1529653655, pid=26328}, "\0\0\0\0"}, {{len=164, 
> type=0x65 /* NLMSG_??? */, flags=0, seq=131104, pid=65544}, 
> "\0\0\0\0\22\0\2\0vm-145-disk-1\0\0\0\330\0\3\0(\0\1\0"...}, {{len=6433, 
> type=0x5 /* NLMSG_??? */, flags=NLM_F_DUMP_INTR, seq=0, pid=1114117}, 
> "\1\0\0\0\5\0\22\0\1\0\0\0\5\0\23\0\0\0\0\0\10\0\24\0\0\0\0\0\10\0\25\0"...}, 
> {{len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}}], iov_len=8192}], 
> msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
> write(3, "4\0\0\0\34\0\1\3\230\251,[\330f\0\0 
> \2\0\0\377\377\377\377\0\0\0\0\30\0\2\0"..., 52

drbdadm runs fine on node 2.

I don't exactly see how to interpret this.

Finally, I can see that node 1 is keeping the drbdctrl ressource as primary :
something must have gone wrong on this node.

drbdtop actually runs correctly and shows for the problematic ressouce :
volume 0 (/dev/drbd164): UpToDate(normal disk state) Blocked: upper
and :
Connection to node2(Unknown): NetworkFailure(lost connection to node2)

How can I debug such situation without rebooting node1 ?

This is not the time we're encountering such situation and rebooting each time
is really a pain, we're talking of highly available clusters.

Any other info I can provide ?

Thanks a lot !

Best regards,
Julien Escario

P.S. : drbdmanage output

On node 1 (actual drbdctrl primary) :

# drbdmanage r
ERROR:dbus.proxies:Introspect error on :1.53:/interface:
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Did not
receive a reply. Possible causes include: the remote application did not send a
reply, the message bus security policy blocked the reply, the reply timeout
expired, or the network connection was broken.

Error: Cannot connect to the drbdmanaged process using DBus
The DBus subsystem returned the following error description:
org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes
include: the remote application did not send a reply, the mess

[DRBD-user] drbdsetup (again) stuck on blocking I/O

2018-06-24 Thread Julien Escario
Hello,
DRBD9 is really a great piece of software but from time to time, we end stuck in
a situation without other solution than reboot.

For exemple, right now, when we ran :
# drdbadm status
It display some ressources than hang on a specific ressource and finally returns
"Command 'drbdsetup status' did not terminate within 5 seconds".

And drdsetup processus stacks. drbdmanage is completely out of order on both
nodes (see below).

Running drbdsetup status with strace runs until problematic ressource and 
displays :
> write(3, 
> "4\0\0\0\34\0\1\3\227\251,[\330f\0\0\37\2\0\0\377\377\377\377\0\0\0\0\30\0\2\0"...,
>  52) = 52
> poll([{fd=1, events=POLLHUP}, {fd=3, events=POLLIN}], 2, 12) = 1 ([{fd=3, 
> revents=POLLIN}])
> poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=}, 
> msg_namelen=12, msg_iov=[{iov_base=[{{len=720, type=0x1c /* NLMSG_??? */, 
> flags=NLM_F_MULTI, seq=1529653655, pid=26328}, "\37\2\0\0\244\0\0\0e\0\0\0 
> \0\2\0\10\0\1@\0\0\0\0\22\0\2@vm-1"...}, {{len=0, type=0 /* NLMSG_??? */, 
> flags=0, seq=0, pid=0}}], iov_len=8192}], msg_iovlen=1, msg_controllen=0, 
> msg_flags=0}, MSG_PEEK) = 720
> poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=}, 
> msg_namelen=12, msg_iov=[{iov_base=[{{len=720, type=0x1c /* NLMSG_??? */, 
> flags=NLM_F_MULTI, seq=1529653655, pid=26328}, "\37\2\0\0\244\0\0\0e\0\0\0 
> \0\2\0\10\0\1@\0\0\0\0\22\0\2@vm-1"...}, {{len=0, type=0 /* NLMSG_??? */, 
> flags=0, seq=0, pid=0}}], iov_len=8192}], msg_iovlen=1, msg_controllen=0, 
> msg_flags=0}, 0) = 720
> poll([{fd=1, events=POLLHUP}, {fd=3, events=POLLIN}], 2, 12) = 1 ([{fd=3, 
> revents=POLLIN}])
> poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=}, 
> msg_namelen=12, msg_iov=[{iov_base=[{{len=20, type=NLMSG_DONE, 
> flags=NLM_F_MULTI, seq=1529653655, pid=26328}, "\0\0\0\0"}, {{len=164, 
> type=0x65 /* NLMSG_??? */, flags=0, seq=131104, pid=65544}, 
> "\0\0\0\0\22\0\2\0vm-145-disk-1\0\0\0\330\0\3\0(\0\1\0"...}, {{len=6433, 
> type=0x5 /* NLMSG_??? */, flags=NLM_F_DUMP_INTR, seq=0, pid=1114117}, 
> "\1\0\0\0\5\0\22\0\1\0\0\0\5\0\23\0\0\0\0\0\10\0\24\0\0\0\0\0\10\0\25\0"...}, 
> {{len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}}], iov_len=8192}], 
> msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_PEEK) = 20
> poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=}, 
> msg_namelen=12, msg_iov=[{iov_base=[{{len=20, type=NLMSG_DONE, 
> flags=NLM_F_MULTI, seq=1529653655, pid=26328}, "\0\0\0\0"}, {{len=164, 
> type=0x65 /* NLMSG_??? */, flags=0, seq=131104, pid=65544}, 
> "\0\0\0\0\22\0\2\0vm-145-disk-1\0\0\0\330\0\3\0(\0\1\0"...}, {{len=6433, 
> type=0x5 /* NLMSG_??? */, flags=NLM_F_DUMP_INTR, seq=0, pid=1114117}, 
> "\1\0\0\0\5\0\22\0\1\0\0\0\5\0\23\0\0\0\0\0\10\0\24\0\0\0\0\0\10\0\25\0"...}, 
> {{len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}}], iov_len=8192}], 
> msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
> write(3, "4\0\0\0\34\0\1\3\230\251,[\330f\0\0 
> \2\0\0\377\377\377\377\0\0\0\0\30\0\2\0"..., 52

drbdadm runs fine on node 2.

I don't exactly see how to interpret this.

Finally, I can see that node 1 is keeping the drbdctrl ressource as primary :
something must have gone wrong on this node.

drbdtop actually runs correctly and shows for the problematic ressouce :
volume 0 (/dev/drbd164): UpToDate(normal disk state) Blocked: upper
and :
Connection to node2(Unknown): NetworkFailure(lost connection to node2)

How can I debug such situation without rebooting node1 ?

This is not the time we're encountering such situation and rebooting each time
is really a pain, we're talking of highly available clusters.

Any other info I can provide ?

Thanks a lot !

Best regards,
Julien Escario

P.S. : drbdmanage output

On node 1 (actual drbdctrl primary) :

# drbdmanage r
ERROR:dbus.proxies:Introspect error on :1.53:/interface:
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Did not
receive a reply. Possible causes include: the remote application did not send a
reply, the message bus security policy blocked the reply, the reply timeout
expired, or the network connection was broken.

Error: Cannot connect to the drbdmanaged process using DBus
The DBus subsystem returned the following error description:
org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes
include: the remote application did not send a reply, the mess

Re: [DRBD-user] drbd-utils-9.3.0

2018-03-21 Thread Julien Escario
Yup, this is ABSOLUTELY boring ;-)

Thanks for your work !

Best regards,
Julien Escario

Le 21/03/2018 à 11:10, Roland Kammerer a écrit :
> Hi,
> 
> This drbd-utils release should be rather boring for most users. It
> accumulates the fixes in the 9.2.x branches and adds some additional
> fixes. Mainly for the combination of DRBD9 and DRBD Proxy. Additionally,
> we now have Japanese translations for DRBD9 man-pages and a fix for
> adjusting diskless nodes that need to delete a minor.
> 
> 9.3.0
> 
>  * update to Japanese man pages
>  * fixes for stacking in drbd-9.0
>  * fixes for proxy support in drbd-9.0
>  * fix adjusting --bitmap=no peer to diskfull
>  * VCS: typos and fixes for stacked resources
>  * fixes from 9.2.1 and 9.2.2
> 
> https://www.linbit.com/downloads/drbd/utils/drbd-utils-9.3.0.tar.gz
> https://github.com/LINBIT/drbd-utils/tree/v9.3.0
> 
> Regards, rck
> 
> 
> 
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] ZFS storage backend failed

2018-02-21 Thread Julien Escario
Le 21/02/2018 à 04:07, Igor Cicimov a écrit :
> 
> 
> On Tue, Feb 20, 2018 at 9:55 PM, Julien Escario  <mailto:esca...@azylog.net>> wrote:
> 
> Le 10/02/2018 à 04:39, Igor Cicimov a écrit :
> > Did you tell it
> > to? 
> https://docs.linbit.com/doc/users-guide-84/s-configure-io-error-behavior/
> 
> <https://docs.linbit.com/doc/users-guide-84/s-configure-io-error-behavior/>
> 
> Sorry for the late answer : I moved on performance tests with a ZFS RAID1
> backend. I'll retry backend failure a little later.
> 
> But ... as far as I understand, 'detach' behavior should be the default 
> no ?
> 
> 
> ​I think the default is/was for DRBD to "pass-on" the error to the higher 
> layer
> that should decide it self how to handle it.

Perhaps, or I notice a strange parameter in zpool attributes :
drbdpool  failmode   wait   default

failmode = wait ? That's something that could lead the DRBD stack not beeing
informed of the zpool failure.

So, 2 more things to test as soon as I have finished the horringbly long list of
performance parameters.

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Correct way to benchmark IO on DRBD

2018-02-20 Thread Julien Escario
Hello,
I'm trying to benchmark *correctly* my lab setup.

Pretty simple : 2 proxmox nodes setup, protocol C, ZFS RAID1 HDDs backend with
mirror log and cache on SSDs.
DRBD9, 10Gbps Ethernet network, tuned latency by reading a lot of papers on 
this.

What I'm trying : run fio with below parameters on 2 VMs running on hypervisor A
and two nodes running on hypervisor 2.
VMs are really simple Ubuntu 17.10 with 5G disks.

fio command line :
fio --filename=/tmp/test.dat --size=1G --direct=1 --rw=randrw --rwmixwrite=30
--refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=128k
--iodepth=16 --numjobs=1 --time_based --runtime=600 --group_reporting
--name=benchtest

Which means : 1G test file, read 70%, write 30%, block size 128k, iodepth 16.
I'm not really sure about others parameters.

My final goal is to get a clue of how many VMs I will be able to run on those
hypervisors with a typical workload on ~500kB/s write and ~2MB/s read.

What would be *really* cool : ability to instanciate a bunch of VMs running this
workload and see when the hypervisors overload. Even cooler : dynamic workload
with threshold (500kB/s at a time and +/-10% randomness one minute later).

Does anyone have an example of such code piece ?
How to you benchmark your disks for 'real life' workload ?

Thank you !
Julien
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] ZFS storage backend failed

2018-02-20 Thread Julien Escario
Le 10/02/2018 à 04:39, Igor Cicimov a écrit :
> Did you tell it
> to? https://docs.linbit.com/doc/users-guide-84/s-configure-io-error-behavior/

Sorry for the late answer : I moved on performance tests with a ZFS RAID1
backend. I'll retry backend failure a little later.

But ... as far as I understand, 'detach' behavior should be the default no ?

My tought is that DRBD wasn't notified or didn't detect the blocked IOs on the
backend. Perhaps a specific bahevior of ZFS.

More tests to come.

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] ZFS storage backend failed

2018-02-09 Thread Julien Escario
Hello,
I'm just doing a lab about zpool as storage backend for DRBD (storing VM images
with Proxmox).

Right now, it's pretty good once tuned and I've been able to achieve 500MB/s
write speed with just a little curiosity about concurrent write from both
hypervisors cluster but that's not the point here.

To complete resiliancy tests, I simplify unplugged a disk from a node. My toughs
was DRBD was just going to detect ZFS failure and detach the ressources from
failed device.

But ... nothing. I/O just hangs on VMs ran on the 'failed' node.

My zpool status :

NAMESTATE READ WRITE CKSUM
drbdpoolUNAVAIL  0 0 0  insufficient replicas
  sda   UNAVAIL  0 0 0

but drbdadm show this for locally hosted VM (on the failed node) :
vm-101-disk-1 role:Primary
  disk:UpToDate
  hyper-test-02 role:Secondary
peer-disk:UpToDate

and remote VM (on the 'sane' node from failed node point of view) :
vm-104-disk-1 role:Secondary
  disk:Consistent
  hyper-test-02 connection:NetworkFailure


So it seems that DRBD didn't detect the I/O failure.

Is there a way to force automatic failover in this case ? I probably missed a
detection mecanism.

Best regards,
Julien Escario

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Going Back to 8.4

2017-12-18 Thread Julien Escario
Le 16/12/2017 à 17:26, Eric Robinson a écrit :
> Over the past year, I’ve tried multiple times to make the switch to 9.X, and
> every time I keep running into the stupid “Cannot connect to the drbdmanaged
> process using DBus” error. I’ve tried it on RHEL 7.1 and SLES 12 SP2. Nobody
> seems to know was causes it, and Google is no help. Maybe the problem is my 
> own
> ignorance, which is very possible, but I’ve never had trouble with 8.4, so I’m
> dropping back to that. Just an FYI to the Linbit folks.

Hello,
DRBD version is not really the problem here : you can still manage DRBD9 'the
old way' without drbdmanage by creating ressources files by hand and using
drdbsetup and drbdadm to configure them.

Nevermind, with the announce of drbdmanage replacement by linstor, it seems that
even linbit will drop drbdmanage support in a few weeks/months. I think we can
agree on the fact that drbdmanage has some design flaws even if we are running a
3 nodes clusters for a few months with pretty good results once we played around
caveats.

>From my point of view, you should consider using drbd9 (with good
lot-of-ressource support, 32 nodes support and data rebalancing) but without
drbdmanage.

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] multiple pools

2017-12-12 Thread Julien Escario
Le 12/12/2017 à 11:54, Robert Altnoeder a écrit :
> On 12/12/2017 11:10 AM, Julien Escario wrote:
> 
>> Hello,
>> May we have a pointer to linstor informations ? I can't find any info on this
>> software by googling 5 min.
>>
>> Best regards,
>> Julien Escario
> That's because no information about the project has been publicly
> released so far.
> 
> A very concise overview is:
> - It is a completely new design and implementation meant as a
> replacement for the existing drbdmanage
> - It's a two-component system comprising a controller and a satellite
> component
> - All communication is through TCP/IP (no control volume, no D-Bus),
> plain or encrypted (SSL/TLS)
> - It supports multiple storage pools
> - It does not keep persistent information on DRBD's state
> - Instead, it tracks DRBD state changes and makes decisions based on
> what state the external environment is in
> - The configuration is kept in an embedded SQL database
> - It's a parallelized system (multiple nodes can run multiple actions
> concurrently)
> - It has very extensive logging and error reporting to make tracking
> problems as easy as possible
> - It has multiuser-security (different strength levels can be configured
> as required)
> - The controller and satellite components are implemented in Java
>   (currently Java 7 compatible, with plans to move to Java 8 in the future)
> - The first command-line client for it is still written in Python
> - It's currently still in a very early stage (an experimental version
> for LINBIT-internal tests will be ready within a few days)
> - There are currently three developers working full-time on it, with a
> fourth one joining in 2018

Sounds promising !
I would just have a reserve about using Java as main language : it's always been
a nightmare to get a working version of JRE. There's a lot of versions and
implementations depending upon the running OS.

And it's really heavy RAM consuming, even for an 'hello world'.

>From my point of view, it doesn't really seems to be a wise choice. A modern
language like Go, Python, Ruby, etc ... could have been far more future-proof.

Best regards,
Julien Escario
<>___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] multiple pools

2017-12-12 Thread Julien Escario
Le 12/12/2017 à 10:06, Roland Kammerer a écrit :
> On Mon, Dec 11, 2017 at 08:24:26PM +, Henning Svane wrote:
>> Hi
>> Is it possible in version 9 to have multiple pools.
>>
>> I would like to make a configuration with 2 pools.
> 
> You have to be a bit more specific on that...
> 
> - For manual configuration you put in the res files whatever you like.
>   One backing device from pool A for resA, one from B for resB.
> - drbdmanage does not support multiple pools.
> - linstor (the project that will eventually replace drbdmanage) will
>   support that.

Hello,
May we have a pointer to linstor informations ? I can't find any info on this
software by googling 5 min.

Best regards,
Julien Escario
<>___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd usage

2017-12-12 Thread Julien Escario
Le 07/12/2017 à 10:36, Sebastian Blajszczak a écrit :
> Hello,
> 
>  
> 
> I´m running proxmox 5.1 with drbd 9.0.9-1 on two nodes with drbdmanage 
> 0.99.14.
> 
> I have the problem that drbd shows me a full usage of 8.74TiB. But I only have
> on each server 4.36t and when I´m calculating the VM volumes I have only 2.4t 
> in
> use.

Hello,
I can see you're using thinlv. AFAIK usage report is based on percentage
returned by lvdisplay command on each host.

Did you tried to run /usr/bin/drbdmanage update-pool ?

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Blocked I/O on /dev/drbd0

2017-11-03 Thread Julien Escario
Hello,
This is not the first I'm seeing what is probably a bug in a older version.
Upgrade is on the way but I need to unblock situation.

Symptoms :
All lvm (I'm using thin-lvm backend) and drbd related commands are blocked on
opening /dev/drbd0 :
# strace lvdisplay
[]
stat("/dev/drbd0", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 0), ...}) = 0
open("/dev/drbd0", O_RDONLY|O_DIRECT|O_NOATIME

Same with all drbdsetup and lvm commands.

drbdmanage fail with communication error with the drbdmanaged process :
# drbdmanage n

Error: Cannot connect to the drbdmanaged process using DBus
The DBus subsystem returned the following error description:
org.freedesktop.DBus.Error.TimedOut: Activation of org.drbd.drbdmanaged timed 
out

So, more investigation :
# ps aux | grep " D"
[snip containing all drbd and lvm related commands launched to debug]
root 27352  0.0  0.0  0 0 ?DJun15   0:03 
[drbd_r_.drbdctr]
root 27353  0.0  0.0  0 0 ?DJun15   0:00 
[drbd_r_.drbdctr]
root 29648  0.0  0.0 110012 14472 ?D 2016  93:30 /usr/bin/python
/usr/bin/dbus-drbdmanaged-service


So, as for as I understand, drbdmanaged opened exclusively the device and
doesn't give it back.

Am I right ?

Is there a way to unblock this without rebooting the whole node ? I tried
drbdmanage shutdown -q.

Kill directly the process ? Is it safe ?

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Manage nodes allocation policy

2017-10-03 Thread Julien Escario
Le 03/10/2017 à 14:50, Robert Altnoeder a écrit :
> On 10/02/2017 06:20 PM, Julien Escario wrote:
>> Hello,
>> In the doc, I can read : "In this case drbdmanage chooses 3 nodes that fit 
>> all
>> requirements best, which is by default the set of nodes with the most free 
>> space
>> in the drbdpool volume group."
>>
>> Is there a way to change this 'default mode' ? For example, with 2 sites, 
>> asking
>> to have *at least* a copy on each site ?
>>
>> Best regards,
>> Julien Escario
>> ___
>> drbd-user mailing list
>> drbd-user@lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
> 
> Technically yes, practically, for most users right now, no.
> 
> The deployment policy is contained in an exchangable object class, which
> can be configured in the drbdmanage configuration. The corresponding
> configuration line is:
> deployer-plugin = drbdmanage.deployers.BalancedDeployer
> It is technically possible to load another deployer object that contains
> different deployment logic, however, no alternative plugins exist so far.
> The interface of those plugins also does not carry information about a
> node's site associations.
> 
> br,

Hello,
Thanks for claryfing ! Seems, we'll to manually balance data across sites.

Best regards,
Julien Escario

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Manage nodes allocation policy

2017-10-02 Thread Julien Escario
Hello,
In the doc, I can read : "In this case drbdmanage chooses 3 nodes that fit all
requirements best, which is by default the set of nodes with the most free space
in the drbdpool volume group."

Is there a way to change this 'default mode' ? For example, with 2 sites, asking
to have *at least* a copy on each site ?

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Manage nodes allocation policy

2017-10-02 Thread Julien Escario
Hello,
In the doc, I can read : "In this case drbdmanage chooses 3 nodes that fit all
requirements best, which is by default the set of nodes with the most free space
in the drbdpool volume group."

Is there a way to change this 'default mode' ? For example, with 2 sites, asking
to have *at least* a copy on each site ?

Best regards,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Promote client-only to complete node ?

2017-09-13 Thread Julien Escario
Le 13/09/2017 à 10:14, Yannis Milios a écrit :
> Usually when I need that, I live migrate the vm from the client node to the
> other node and then I use drbdmanage unassign/assign to convert client to a
> 'normal' satellite node with local storage. Then wait for sync to complete and
> move the vm back to the original node (if necessary).

Yup, I'm also doing this but yesterday, I encoutered a VM with mixed DRBD and
local ressources and it couldn't be live migrated. I had to move local
ressources to an NFS server. Not really a pain but being able to 'simply'
promote the node from client to a real node would have been fun ;-)

Regards,
Julien



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Promote client-only to complete node ?

2017-09-13 Thread Julien Escario
Le 12/09/2017 à 11:39, Roland Kammerer a écrit :
> On Tue, Sep 12, 2017 at 09:49:26AM +0200, Julien Escario wrote:
>> Hello,
>> I'm trying to 'promote' a client node to have a local copy of datas but can't
>> find any reference to such command in manual.
>>
>> I tried :
>> vm-260-disk-1 role:Secondary
>>   disk:Diskless
>>   vm4 role:Secondary
>> peer-disk:UpToDate
>>   vm5 role:Secondary
>> peer-disk:UpToDate
>>
>> This one is in secondary state (VM not running) so I could unassign/assign 
>> this
>> ressource to this node without problem.
>>
>> But if the ressource is already in primary state, any way to ask for a local
>> copy of datas with drbdmanage ?
> 
> Definitely not with drbdmanage. You have to get that node in Secondary
> state and then drbdmanage unassign/assign.

Thanks for this clear answer ! Perhaps something to add to the todo list ?
From my point of view, technically, nothing seems to block this feature but in
term of code base, this is probably harder.

Best regards,
Julien Escario



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Authentication of peer failed ?

2017-09-12 Thread Julien Escario
Le 12/09/2017 à 00:11, Lars Ellenberg a écrit :
> On Mon, Sep 11, 2017 at 11:21:35AM +0200, Julien Escario wrote:
>> Hello,
>> This moring, when creating a ressource from Proxmox, I got a nice
>> "Authentication of peer failed".
>>
>> [33685507.246574] drbd vm-115-disk-1 vm7: Handshake to peer 2 successful: 
>> Agreed
>> network protocol version 111
>> [33685507.246579] drbd vm-115-disk-1 vm7: Feature flags enabled on protocol
>> level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
>> [33685507.246597] drbd vm-115-disk-1 vm7: sock was shut down by peer
>> [33685507.246617] drbd vm-115-disk-1 vm7: conn( Connecting -> BrokenPipe )
>> [33685507.246642] drbd vm-115-disk-1 vm7: short read (expected size 16)
>> [33685507.246644] drbd vm-115-disk-1 vm7: Authentication of peer failed, 
>> trying
>> again.
> 
> Well, in this case, the "authentication" failed, because the connection
> was torn down during the exchange. Which is why it thinks it could help
> to try again. If it had failed auth because, well, "wrong credentials",
> which really is only "shared secret" & node-id not matching,
> it would not try again: the shared secret that "impostor" peer knows
> won't change because we try again.
> 
> Why was it torn down?  I don't know.

Small precision : it's only on one ressource (every other is fine) :

vm-115-disk-1 role:Secondary
  disk:Inconsistent
  vm4 connection:StandAlone
  vm7 connection:StandAlone

This ressource isn't anymore defined on 'vm4' and 'vm7' nodes.

>> My install is a bit outdated :
>> python-drbdmanage0.97-1
>> drbd-utils   8.9.7-1
> 
> And you rather not even mention the kernel module version :-)

Right, sorry :
# cat /proc/drbd
version: 9.0.3-1 (api:2/proto:86-111)
GIT-hash: a14cb9c3818612dfb8c3288db28a591d5a0fc2a6 build by root@nora,
2016-07-28 10:59:06
Transports (api:14): tcp (1.0.0)

> It even wrote that it was trying again, all by itself.
> If it does not do that, but is in fact stuck in some supposedly
> transient state like "Unconnected", you ran into a bug.
> 
> Of course you still can try to "--force disconnect", and/or "adjust".
> Depends on what the current state is.

Tried a few things :
# drbdadm adjust vm-115-disk-1
'vm-115-disk-1' not defined in your config (for this host).

# drbdadm down vm-115-disk-1
'vm-115-disk-1' not defined in your config (for this host).

# drbdmanage unassign vm-115-disk-1 vm5
Error: Object not found

It seems that neither drbdadm nor drbdmanage doesn't anymore know this
ressource. Sad :-(

Julien



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Promote client-only to complete node ?

2017-09-12 Thread Julien Escario
Hello,
I'm trying to 'promote' a client node to have a local copy of datas but can't
find any reference to such command in manual.

I tried :
vm-260-disk-1 role:Secondary
  disk:Diskless
  vm4 role:Secondary
peer-disk:UpToDate
  vm5 role:Secondary
peer-disk:UpToDate

This one is in secondary state (VM not running) so I could unassign/assign this
ressource to this node without problem.

But if the ressource is already in primary state, any way to ask for a local
copy of datas with drbdmanage ?

Best regards,
Julien Escario



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Authentication of peer failed ?

2017-09-11 Thread Julien Escario
Hello,
This moring, when creating a ressource from Proxmox, I got a nice
"Authentication of peer failed".

[33685507.246574] drbd vm-115-disk-1 vm7: Handshake to peer 2 successful: Agreed
network protocol version 111
[33685507.246579] drbd vm-115-disk-1 vm7: Feature flags enabled on protocol
level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
[33685507.246597] drbd vm-115-disk-1 vm7: sock was shut down by peer
[33685507.246617] drbd vm-115-disk-1 vm7: conn( Connecting -> BrokenPipe )
[33685507.246642] drbd vm-115-disk-1 vm7: short read (expected size 16)
[33685507.246644] drbd vm-115-disk-1 vm7: Authentication of peer failed, trying
again.
[33685507.289987] drbd vm-115-disk-1 vm7: Connection closed
[33685507.290013] drbd vm-115-disk-1 vm7: conn( BrokenPipe -> Unconnected )

My install is a bit outdated :
python-drbdmanage0.97-1
drbd-utils   8.9.7-1

but upgrading is ... complex ;-)

Any way to correct this without shuting down all ressources ? (and reboot).

I was thinking of some so-nice hidden command to force a kind of reauth between
2 hosts.

Best regards,
Julien Escario



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Changes on drbdmanage

2017-09-08 Thread Julien Escario
Le 07/09/2017 à 18:43, Roland Kammerer a écrit :
> On Thu, Sep 07, 2017 at 01:48:02PM +0300, Tsirkas Georgios wrote:
>> Hello,
>> What are the changes on drbdmanage command;
> 
> w00t?

I was thinking almost the same thing. Even started writing a flame answer ;-)

Can you unsubscride such lame users ?

Julien



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Documentation missing page

2017-09-01 Thread Julien Escario
Hello,
Just to let you know that the link
https://docs.linbit.com/doc/users-guide-90/s-proxmox-configuration is dead in
the documentation.

This link is present on
https://docs.linbit.com/doc/users-guide-90/s-proxmox-install/ and on
https://docs.linbit.com/doc/users-guide-90/ch-proxmox/

Julien



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-08-18 Thread Julien Escario
Le 17/08/2017 à 16:48, Gionatan Danti a écrit :
> Hi list,
> I am discussing how to have a replicated ZFS setup on the ZoL mailing list, 
> and
> DRBD is obviously on the radar ;)
> 
> It seems that three possibilities exist:
> 
> a) DRBD over ZVOLs (with one DRBD resource per ZVOL);
> b) ZFS over DRBD over the RAW disks (with DRBD resource per disk);
> c) ZFS over DRBD over a single huge and sparse ZVOL (see for an example:
> http://v-optimal.nl/index.php/2016/02/04/ha-zfs/)
> 
> What option do you feel is the better one? On the ZoL list seems to exists a
> preference for option b - create a DRBD resource for each disk and let ZFS
> manage the DRBD devices.
> 
> Any thought on that?
> Thanks.
> 

Hello,
I didn't play with ZFS and DRBD, only LVM for now but similar questions occurs.

May I suggest you to let drbdmanage be in charge of this question ? I don't know
which option it will choose but claerly, drbdmanage simplifies A LOT management
of volumes.

The only thing important is to have one DRBD ressource per (I assume) VM disk.
As such, in caseof split brain, you can still have primary on every node, with
or without a synchronized peer.

If you design with a signle big ressource, a simple split brain and you're 
screwed.

Julien



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBDmanage (re)initialization

2017-06-12 Thread Julien Escario
Le 12/06/2017 à 10:09, Robert Altnoeder a écrit :
> On 06/12/2017 09:39 AM, Julien Escario wrote:
> 
>> Finally, I've been able to fully restore vm4 and vm5 (drbdsetup and 
>> drbdmanage
>> working) but not vm7.
>>
>> I've done that by firewalling port 6999 (port used by .drbdctrl ressource) 
>> and
>> issuing a down/up on drbdctrl on vm4 and vm5.
>>
>> [...]
>>
>> It would be really nice to get back to normal without a reboot. Any advice ?
> Force-disconnecting and/or firewalling the ports of all DRBD resources
> sometimes unblocks stuck kernel threads, but if even that does not help,
> then there is virtually no way to recover from a stuck kernel without
> rebooting it (well, maybe with a kernel debugger there would be, but
> that's a rather theoretical possibility).

Nice idea ! Let's try this on vm7 :
# iptables -A INPUT -i eth1.1007 -p tcp --match multiport --dports 7000:7030 -j 
DROP
Same for outbound (to be sure) :
# iptables -A OUTPUT -o eth1.1007 -p tcp --match multiport --dports 7000:7030 -j
DROP

root@vm4:~# cat /var/lib/drbd.d/drbdmanage_vm-104-disk-1.res
[...]
   connection-mesh {
hosts vm7 vm4;
   }
on vm7 {
node-id 1;
address 10.152.12.60:7007;
[...]

So, this ressource (vm104 disk 1) use port 7007.

Check :
root@vm4:~# nmap -p7007 vm7
PORT STATESERVICE
7007/tcp filtered afs3-fileserver

TCP sessions were still opened so I killed them all with tcpkill :
root@vm4:~# tcpkill -i eth1.1007 portrange 7000-7030

All TCP sessions disapeard from netstat and my ressources are gone to connecting
state.

But situation isn't really better ... drbd processes are still blocked on an
undetermined I/O. I'll wait a few hours to see if something finally timeout on
the server and if it's not the case, I'll resignate and reboot the whole box.
(and UPGRADE !).

Thanks for your help !
Julien



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBDmanage (re)initialization

2017-06-12 Thread Julien Escario
Le 12/06/2017 à 09:57, Roland Kammerer a écrit :
> Without access to that machine, I'd say that is how you have to resolve
> it (reboot). And yes, we also saw these hangs in old drbd9 versions.

Ok, good to know. Of course, I won't ask you to go further without a proper
support contract. It was just to check if I can solve this situation by myself.

> Please do yourself a favor and update to a recent version of the whole
> DRBD stack. All the version information I saw in that thread is scary...

Thanks for the advice. I'll upgrade all versions BUT is it safe to upgrade
drbd-utils 9.0.0, drbdmanage 0.99.5-1 an drbd 9.0.7 on a node, reboot, move
running ressources away from a node, do the same, and so on for the last one ?

AFAIK I have to keep consistency about releases on the same node (of course) but
what will happen with different versions on different nodes ?

Thanks for a lot !
Julien Escario



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBDmanage (re)initialization

2017-06-12 Thread Julien Escario
Le 09/06/2017 à 14:24, Julien Escario a écrit :
> Le 09/06/2017 à 09:59, Robert Altnoeder a écrit :
>> On 06/08/2017 04:14 PM, Julien Escario wrote:
>>> Hello,
>>> A drbdmanage cluster is actually stuck in this state :
>>> .drbdctrl role:Secondary
>>>   volume:0 disk:UpToDate
>>>   volume:1 disk:UpToDate
>>>   vm4 connection:NetworkFailure
>>>   vm7 role:Secondary
>>> volume:0 replication:WFBitMapS peer-disk:Inconsistent
>>> volume:1 peer-disk:Outdated
>>> [...]
>>> Any way to restart this ressource without losing all other ressources ?
>> on vm4 and vm7, try 'drbdadm down .drbdctrl' followed by 'drbdadm up
>> .drbdctrl'.
>> In most cases, it just reconnects and fixes itself.

[Sorry for the double post]

Finally, I've been able to fully restore vm4 and vm5 (drbdsetup and drbdmanage
working) but not vm7.

I've done that by firewalling port 6999 (port used by .drbdctrl ressource) and
issuing a down/up on drbdctrl on vm4 and vm5.

So far, so good.

It seems to be the pure drbd part is somewhat screwed on vm7. I can't issue any
drbdadm/drbdsetup command. They all hang up and keep running without being
killed by time or kill (even -9).

With strace, drbdsetup status output is as the attached file. It seems to hang
while writing to a socket but I'm not really familiar with strace output.

It would be really nice to get back to normal without a reboot. Any advice ?

Thanks for your help,
Julien
root@vm7:~# strace drbdsetup status
execve("/usr/sbin/drbdsetup", ["drbdsetup", "status"], [/* 15 vars */]) = 0
brk(0)  = 0x19c
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f267af3b000
access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=48549, ...}) = 0
mmap(NULL, 48549, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f267af2f000
close(3)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\34\2\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1738176, ...}) = 0
mmap(NULL, 3844640, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f267a972000
mprotect(0x7f267ab14000, 2093056, PROT_NONE) = 0
mmap(0x7f267ad13000, 24576, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a1000) = 0x7f267ad13000
mmap(0x7f267ad19000, 14880, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f267ad19000
close(3)= 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f267af2e000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f267af2d000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f267af2c000
arch_prctl(ARCH_SET_FS, 0x7f267af2d700) = 0
mprotect(0x7f267ad13000, 16384, PROT_READ) = 0
mprotect(0x61a000, 4096, PROT_READ) = 0
mprotect(0x7f267af3d000, 4096, PROT_READ) = 0
munmap(0x7f267af2f000, 48549)   = 0
chdir("/")  = 0
stat("/proc/drbd", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
open("/proc/drbd", O_RDONLY)= 3
brk(0)  = 0x19c
brk(0x19e2000)  = 0x19e2000
read(3, "version: 9.0.3-1 (api:2/proto:86"..., 4095) = 162
close(3)= 0
brk(0x19e1000)  = 0x19e1000
socket(PF_NETLINK, SOCK_DGRAM, NETLINK_GENERIC) = 3
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [1048576], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [1048576], 4) = 0
bind(3, {sa_family=AF_NETLINK, pid=0, groups=}, 12) = 0
getsockname(3, {sa_family=AF_NETLINK, pid=5040, groups=}, [12]) = 0
write(3, " \0\0\0\20\0\1\0\5D>Y\260\23\0\0\3\2\0\0\t\0\2\0drbd\0\0\0\0", 32


smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBDmanage (re)initialization

2017-06-09 Thread Julien Escario
Le 09/06/2017 à 09:59, Robert Altnoeder a écrit :
> On 06/08/2017 04:14 PM, Julien Escario wrote:
>> Hello,
>> A drbdmanage cluster is actually stuck in this state :
>> .drbdctrl role:Secondary
>>   volume:0 disk:UpToDate
>>   volume:1 disk:UpToDate
>>   vm4 connection:NetworkFailure
>>   vm7 role:Secondary
>> volume:0 replication:WFBitMapS peer-disk:Inconsistent
>> volume:1 peer-disk:Outdated
>> [...]
>> Any way to restart this ressource without losing all other ressources ?
> on vm4 and vm7, try 'drbdadm down .drbdctrl' followed by 'drbdadm up
> .drbdctrl'.
> In most cases, it just reconnects and fixes itself.

Many thanks !
BUT, it seems I have a bigger problem on vm7.

First, vm4 and vm5 and secondary on the drbdctrl ressource.

Running 'drbdadm status' on vm7 timeout, as does any drdbsetup command.

In the logs on vm4, I have :
[25578277.437480] drbd .drbdctrl: Preparing cluster-wide state change 3545178393
(0->3 499/146)
[25578277.437809] drbd .drbdctrl: Aborting cluster-wide state change 3545178393
(0ms) rv = -19

Which is probably normal as drbdmanage can't setup primary state.

On vm5, kernel are slitghlty different :
[25574921.080463] drbd .drbdctrl vm7: Ignoring P_TWOPC_ABORT packet 2546904845.

And on vm7, almost the same :
[25396073.307115] drbd .drbdctrl vm4: Rejecting concurrent remote state change
2590742863 because of state change 2272799652
[25396073.307390] drbd .drbdctrl vm4: Ignoring P_TWOPC_ABORT packet 2590742863.


Showing drbdmanage nodes is fine on vm7 :
> # drbdmanage n
> ++
> | Name | Pool Size | Pool Free |  
>| State |
> ||
> | vm4  |921600 |363755 |  
>|ok |
> | vm5  |921600 |329840 |  
>|ok |
> | vm7  |921600 |380712 |  
>|ok |
> ++

But timeout on vm4 and vm5, which, as previous, is fine if drbdctrl ressource
can't go primary (correct me if I'm wrong).

So it seems things are screwed up on vm7 but  VMs are backuped each night
successfully.

Any idea to get out of this situation without having to reboot the whole node ?
Being unable to run any drbdsetup command successully makes me wonder that only
a reboot will do the trick.

Thanks !
Julien Escario



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBDmanage (re)initialization

2017-06-09 Thread Julien Escario
Le 09/06/2017 à 09:59, Robert Altnoeder a écrit :
> On 06/08/2017 04:14 PM, Julien Escario wrote:
>> Hello,
>> A drbdmanage cluster is actually stuck in this state :
>> .drbdctrl role:Secondary
>>   volume:0 disk:UpToDate
>>   volume:1 disk:UpToDate
>>   vm4 connection:NetworkFailure
>>   vm7 role:Secondary
>> volume:0 replication:WFBitMapS peer-disk:Inconsistent
>> volume:1 peer-disk:Outdated
>> [...]
>> Any way to restart this ressource without losing all other ressources ?
> on vm4 and vm7, try 'drbdadm down .drbdctrl' followed by 'drbdadm up
> .drbdctrl'.
> In most cases, it just reconnects and fixes itself.

Many thanks !
BUT, it seems I have a bigger problem on vm7.

First, vm4 and vm5 and secondary on the drbdctrl ressource.

Running 'drbdadm status' on vm7 timeout, as does any drdbsetup command.

In the logs on vm4, I have :
[25578277.437480] drbd .drbdctrl: Preparing cluster-wide state change 3545178393
(0->3 499/146)
[25578277.437809] drbd .drbdctrl: Aborting cluster-wide state change 3545178393
(0ms) rv = -19

Which is probably normal as drbdmanage can't setup primary state.

On vm5, kernel are slitghlty different :
[25574921.080463] drbd .drbdctrl vm7: Ignoring P_TWOPC_ABORT packet 2546904845.

And on vm7, almost the same :
[25396073.307115] drbd .drbdctrl vm4: Rejecting concurrent remote state change
2590742863 because of state change 2272799652
[25396073.307390] drbd .drbdctrl vm4: Ignoring P_TWOPC_ABORT packet 2590742863.


Showing drbdmanage nodes is fine on vm7 :
> # drbdmanage n
> ++
> | Name | Pool Size | Pool Free |  
>| State |
> ||
> | vm4  |921600 |363755 |  
>|ok |
> | vm5  |921600 |329840 |  
>|ok |
> | vm7  |921600 |380712 |  
>|ok |
> ++

But timeout on vm4 and vm5, which, as previous, is fine if drbdctrl ressource
can't go primary (correct me if I'm wrong).

So it seems things are screwed up on vm7 but  VMs are backuped each night
successfully.

Any idea to get out of this situation without having to reboot the whole node ?
Being unable to run any drbdsetup command successully makes me wonder that only
a reboot will do the trick.

Thanks !
Julien Escario




smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBDmanage (re)initialization

2017-06-08 Thread Julien Escario
Hello,
A drbdmanage cluster is actually stuck in this state :
.drbdctrl role:Secondary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate
  vm4 connection:NetworkFailure
  vm7 role:Secondary
volume:0 replication:WFBitMapS peer-disk:Inconsistent
volume:1 peer-disk:Outdated


View differs but it seems we have some kind of split brain on the drbdctrl
ressource and drbdmanage doesn't manage anything ;-)

Any way to restart this ressource without losing all other ressources ?

# cat /proc/drbd
version: 9.0.3-1 (api:2/proto:86-111)

# drbdmanage -v
drbdmanage 0.97; GIT-hash: d0032c6a22c29812263ab34a6e856a5b36fd7da0

Of course, same versions on three nodes.

Thanks for your help,
Julien Escario



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Ressource decomission stuck

2016-08-19 Thread Julien Escario
Le 18/08/2016 à 13:33, Julien Escario a écrit :
> Hello,
> After rebooting a node, I can see somthing strange :
> 
> # drbdmanage list-assignments
>> | vm4  | vm-206-disk-1 |  * | |  
>>  ok |
>> | vm5  | vm-206-disk-1 |  * | |  
>>  ok |
>> | vm7  | vm-206-disk-1 |  * | | 
>> FAILED(3), pending actions: decommission |
> 
> So it seems the ressource removal is stuck in the middle of the operation.

Even after rebooting the node, the pending action is still there.

Any advice ?

Best regards,
Julien




smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Ressource decomission stuck

2016-08-18 Thread Julien Escario
Hello,
After rebooting a node, I can see somthing strange :

# drbdmanage list-assignments
> | vm4  | vm-206-disk-1 |  * | |   
> ok |
> | vm5  | vm-206-disk-1 |  * | |   
> ok |
> | vm7  | vm-206-disk-1 |  * | | 
> FAILED(3), pending actions: decommission |

So it seems the ressource removal is stuck in the middle of the operation.

I already tried different actions so I can't assume it was in this state before.

I'm using lvm thin as plugin and I don't have any more vm-206-disk-1 lv and
drdbadm status returns nothing about this ressource.

Is there a way to remove this properly ? I guess I would be able to reassign the
ressource to this node after.

And moving /var/lib/drbd.d/drbdmanage_vm-206-disk-1.res away didn't change 
anything.

Best regards,
Julien



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Is DRBD9 ready for production

2016-08-18 Thread Julien Escario
Le 11/08/2016 09:10, Ml Ml a écrit :
> Hello List,
> 
> i wonder if DRBD9 is ready for production?
> 
> I posted my Problem here:
>   http://lists.linbit.com/pipermail/drbd-user/2016-April/022893.html
> 
> And i ran into this problem a few times now. So i switched to a 2 Node
> Setup (which works fine so far).
> 
> Everytime i go back to a 3 Node Cluster it takes only a few days and i
> run into the "ASSERTION FAILED: pp_in_use" problem again.
> 
> Please tell me if this is "just" a bug (bugs can happen) or if i
> should NOT use DRBD9 for production environment.

Just to give a little feedback : we're running a 3 proxmox nodes cluster with
DRBD9 and drbdmanage and everything works as expected.

Of course, it didn't went up out of the box and a test cluster was useful in
order to test drive everything together, know were the caveats are and what
could be done or not.

I've almost completed upgrade of our cluster with the latest proxmox version by
migrating one by one all VMs, upgrading, rebooting, etc ...

This should be done in a specific order and I had to read a lot of 
documentation.

Just to complete : we had a switch outage and our 3 nodes were full disconnected
at once a few weeks ago : nothing special happened, once the switch came back a
few minutes after, all ressources resync and we didn't had a crash and
split-brain, etc ...

The last strange behavior I had is that after rebooting nodes, a few ressources
were in state StandAlone. As it's a production cluster, I didn't try to resolve
this the proper way and ask for unassign/re-assign. This asked a full resync and
a little time but after, all ressources went back to normal.

So MY answer is : DRBD9 is production ready but be prepared to run a test
cluster and make some crash tests before going into production. Test cluster
MUST remain active to try any upgrade procedure before applying to production.

My 2cts,
Julien



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdmanage v0.97

2016-08-18 Thread Julien Escario
Le 17/08/2016 12:19, Roland Kammerer a écrit :
> On Wed, Aug 17, 2016 at 11:34:22AM +0200, Julien Escario wrote:
>> So my question now : is there a way to restart drbdmanage 'server' without
>> having to restart the whole server ? As it's dbus, I don't want to create a
>> mess.
> 
> "drbdmanage restart -q"

Great ! It seems to be a new command, I can't see it on previous drbdmanage
version (at least with 0.91).

It works like a charm, no ressource where disconnect on my 2 nodes cluster,
everything continued to works as expected.

And right after, I created a 100GB disk which went UpToDate instantly and
consuming around 0 disk space ON BOTH NODES.

This is really a great feature, many thanks !

Best regards,
Julien




smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdmanage v0.97

2016-08-17 Thread Julien Escario
Hello,
First, thanks for this release ! It comes with a lot a great features, the most
important for us is REAL thin provisionning.

The last missing feature is multiple pool.

I just upgraded our test cluster this morning :
# cat /proc/drbd
version: 9.0.3-1 (api:2/proto:86-111)

ii  drbd-utils 8.9.7-1  amd64
ii  python-drbdman 0.97-1   all

We're using lvm-thinpool.

I created a new volume and resync is completely done, eating all my disks :-(

This is probably due to :
# drbdmanage server-version
server_version=0.96.1

So I rebooted one the machines and now :
# drbdmanage server-version
server_version=0.97

So my question now : is there a way to restart drbdmanage 'server' without
having to restart the whole server ? As it's dbus, I don't want to create a
mess. I would like to test the procedure and the second node.

Best regards,
Julien Escario


Le 15/07/2016 13:48, Roland Kammerer a écrit :
> Dear DRBD9 & drbdmanage users & testers,
> 
> we have published version 0.97 of drbdmanage.
> 
> While drbdmanage 0.96 was a release with minor fixes and mainly for
> internal reasons, version 0.97 is a release with many bug/reliability
> fixes. Noteworthy changes include, but are not limited to:
> 
> - ZFS thin plugin
> - drbdmange dbus trace/replay
> - initial unit testing
> - fixes for resize
> - drbdmanage message log (drbdmanage ml)
> - ClientHelper class (introduced in v0.96) which should be used by
>   external projects written in Python like the new drbd docker plugin.
> - skip initial sync on thinly provisioned storage
> - ...
> 
> If you use drbdmanage, you certainly want to upgrade. Make sure
> you also use the drbd kernel module version 9.0.3 and drbd-utils version
> 8.9.7 (strict requirement, or drbdmanage eats your kittens).
> 
> Changes since v0.96:
> 
> [ Philipp Marek ]
>  * Fix description text.
>  * ClientHelper bugfix: _LW() only takes one argument here.
>  * Typos fixed: "drbdmange" => "drbdmanage".
>  * Storage header: dump the serial-number too.
>  * Satellites: Use "bz2" compression instead of "zlib".
>  * Human-readable output: never use 3 digits after the comma.
> 
> [ Robert Altnoeder ]
>  * DrbdManager: If drbdadm could not be spawned, fall back to drbdsetup too
>  * Fixed storage plugin remove_blockdevice() infinite loop
>  * Mark assignments as deployed before undeploying them
>  * Do not skip removing block devices when undeploying diskless assignments
>  * DrbdAdm: fixed duplicate logging
>  * Add reasonable thin provisioning capability
>  * DrbdManager: _deploy_volume_actions(): fail on metadata creation failure
>  * Update resource files instead of piping to drbdadm
>  * Copy drbdsetup options on snapshot restore into new resource
>  * Added missing argument to format string
>  * list_nodes(): Check node online status only on control nodes
>  * utils.py: new get_free_number() implementation
>  * DrbdManager: fixed reference to wrong class
>  * DrbdManager: Remember thin volumes and skip resyncing after resize if  
> [...]
>  * Reordered "drbdadm resize -- --assume-clean" -> "drbdadm -- --assume-c 
> [...]
>  * Return success for no-op on blockdevice resize on volume states withou 
> [...]
>  * resize: Check for size increment > poolsize instead of gross size > 
> poolsize
>  * resize: ignore clients, resize only if the assignment is in a sane state
>  * Added logger classes for logging external command output
>  * Changed _run_drbdadm() to use ExternalCommandBuffer
>  * commands: _run_drbdadm() transformed into a more generic run_drbdutils()
>  * commands: removed unnecessary import 'subprocess'
>  * Hide init/join external command output unless exit_code != 0
>  * Fixed class member -> instance member
>  * Handle assignment transitions to and from diskless
>  * Set FLAG_ATTACH cstate & tstate on client->server transition
>  * Verbose drbdadm command logging (drbdadm -vvv)
>  * Skip new-current-uuid when restoring snapshots
>  * Allow double open(), as it can be caused by the async D-Bus calls
>  * DrbdManager: fail_count fixes, more logging, removed duplicate code block
>  * Added property consts.MANAGED
>  * restore-resize-snapshot partial implementation
>  * Added MessageLog class for tracking recent info/warning/alert messages
>  * Message log commands, emit messages to the message log
>  * More fixes for D-Bus/Python surprises in restore_snapshot()
>  * MessageLog: configurable number of backlog entries
> 
> [ Roland Kammerer ]
>  * satellite: improve connect stage
>  * allow resource options in the common section
>  * add missing space to description
>  * plugins: added zvol thin plugin
>  * added role s

Re: [DRBD-user] A question

2016-02-18 Thread Julien Escario

Hello,
Using /proc/drbd doesn't sounds like a good idea : with drbd9, there's 
nothing but version in there.


Perhaps should you use drbdadm and other command line tools ?

Regards,
Julien

Le 18/02/2016 19:59, Digimer a écrit :

Hi all,

   I'm working on a program that (amoung other things) manages DRBD. Can
you get the estimated number of seconds remaining for a resync
operation? /proc/drbd shows "finish: 1:19:39 speed: 3,096 (3,316) K/sec"
and I'd like to avoid having to translate that human-readable time into
seconds if the data is already available directly somewhere.

Thanks!


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Pool free report by drbdmanage can't be trust

2016-02-04 Thread Julien Escario
Le 04/02/2016 10:07, Robert Altnoeder a écrit :
> On 01/29/2016 11:08 AM, Julien Escario wrote:
>> Le 25/01/2016 15:19, Julien Escario a écrit :
>>> So I'm wondering how and when 'pool free' value is calculated. Is it
>>> recalculated only when a new ressource is created ? deleted ?
> It is normally recalculated whenever drbdmanage changes something that
> modifies the amount of available space.

Right now, this is not the case. With python-drbdmanage 0.91-1 at least (and
lvm-thinlv backend). Creating a new ressource doesn't seems to call a recalculus
of free space.

Perhaps is ti really the case at *creation* time. But on the syncsource, thin
allocation means there's nearly zero space used. But with the initial sync,
there's a lot of space used on the synctarget. After initial sync, free space is
not recalculated for sure.

> Another problem is that any backend that uses thin allocation
> essentially returns data from fantasy land, so the pool free value in
> drbdmanage will never be anything better than a rough estimate. Actually
> allocating a volume of the reported free size might work or might not
> work, depending on how much actual storage ends up being allocated upon
> creating the resource on each node.

Yup ! That's actually my major caveat. Returned datas by lvm-thin are pretty
'logic' and pool is slowly filled. That does not seems to be the same thing for
Urban Larsson that does have really strange numbers.

No, the real thing is about sync on thinlvm. But perhaps do we have a solution
with 9.0.1 and rs-discard-granularity parameter.

It seems to be a drbdsetup parameter. Could it be used with drbdmanage ? In a
future release ?

I would be happy to know how this should be set. (depends upon backend storage
block size ?).

> Right now, DRBD still full-syncs (which will also change in the future,
> because obviously that does not make a lot of sense with thin
> allocation), and drbdmanage does not yet have lots of logic for
> estimating thin allocation, and for both reasons, all the values
> returned by drbdmanage as it is now are usually very conservative and
> fat-allocation-like regarding free space.

Not much. I really have concrete free space report on my 3-nodes setup. At
least, I'm conservative with the disks on syncsource node : so I'm creating VMs
with a round-robin-like algorithm. Not really the worse.

But, yes, having the possibility to 'jump' across initial sync would be a great
feature ! (as we KNOW that there's no data at creation time).

>>> Is there a way to force it to rescan free space ? (with a dbus command 
>>> perhaps ?)
>>>
>>> That could perhaps be done by a cron job running at defined frequency ?
> We have a prototype of a daemon that does something like that
> (drbdmanage-poolmonitord). While update-pool always locks the control
> volumes and updates pool data, the daemon uses a monitoring function
> (the update_pool_check() D-Bus API) that first checks whether the amount
> of space has changed, and only if it did, it triggers the update-pool
> command.
> That will be released with some future release of drbdmanage, as it will
> be especially useful if a storage pool is shared by drbdmanage and other
> allocators.

Well, sharing the storage with others allocators doesn't really seems a good
idea for now ;-)

If two or more nodes tries to lock the control volumes, is there some kind of
waiting queue to get the lock ? I think this has already been predicted.

Thanks for your precision ! I would be happy to help you with tests in
differents cases from a 'customer' point of view if this could help you !

Best regards,
Julien




smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdmanage Pool Free calculation

2016-02-03 Thread Julien Escario
Le 02/02/2016 16:08, Urban Larsson a écrit :
> root@node1:/var/lib/drbd.d# <mailto:root@a002-pve1-r01:/var/lib/drbd.d#> lvs
> 
>   LV   VG 
>  
> Attr LSize   Pool
> Origin
> Data%   Meta%  Move Log Cpy%Sync 
> Convert
> 
>   vm-200-disk-1   backup  -wi-ao  
>   
> 2.24t
> 
>   .drbdctrl_0  drbdpool  -wi-ao  
>   4.00m
> 
>   .drbdctrl_1  drbdpool  -wi-ao  
>   4.00m
> 
>   drbdthinpool drbdpool  twi-aotz--   
>  
> 1.69t
> 43.24 
>  87.26
> 

I'm really surprised by your high percentage of metadata (87,26%).

Did you do some snapshots ? It shoud be displayed by the lvs command but perhaps
some are still pending deletion or something like that ?

Could you try running
# lvs -o+metadata_percent

And report the output ?

Regards,
Julien Escario



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Pool free report by drbdmanage can't be trust

2016-01-29 Thread Julien Escario
Le 25/01/2016 15:19, Julien Escario a écrit :
> So I'm wondering how and when 'pool free' value is calculated. Is it
> recalculated only when a new ressource is created ? deleted ?
> 
> Is there a way to force it to rescan free space ? (with a dbus command 
> perhaps ?)
> 
> That could perhaps be done by a cron job running at defined frequency ?

Finally found the solution (thanks to Urban Larsson from Sweden) :

Simply run drbdmanage update-pool as cronjob frequently to keep a trace of thin
allocation ON EVERY NODE (it only updates the running node's free space).

Not really a greate workaround but it might do the job for now.

This opens another question for thin allocation on SyncTarget(s) but that's
another thread.

Best regards,
Julien Escario



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Misleading error messages from drbdadm up (IP not found on this host)

2016-01-27 Thread Julien Escario
Le 27/01/2016 12:10, Matthew Vernon a écrit :
> resource mws-priv-7 {
>   device /dev/drbd87;
>   disk /dev/guests/mwsig-mws-priv-7;
>   meta-disk internal;
>   on agogue {
> address ipv6 [fd19:1b70:f7a6:1ae5::8d:6]:7875;
> }
>   on odochium {
> address ipv6 [fd19:1b70:f7a6:1ae5::8d:7]:7875;
> }
>   net {
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
>   }
> }

This is just check but is it possible you have another ressource on the same
ports number on the same ipv6 ?

This could perhaps lead to unknown IP address message (in which case it should
say 'unable to bind port : already in use by another process').

Regards,
Julien



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Pool free report by drbdmanage can't be trust

2016-01-25 Thread Julien Escario
Hello,
So to continue with my experiments with drbdmanage and thinlv plugin :

drbdmanage 0.91 (same thing seems to happen with 0.50).
storage-plugin = drbdmanage.storage.lvm_thinlv.LvmThinLv

I create a VM with proxmox, 10 GB disk.

Right after, I got this free pool space with drbdmanage list-nodes :
> ++
> | Name | Pool Size | Pool Free |  
>| State |
> ||
> | vm4  |523264 |521066 |  
>|ok |
> | vm5  |523264 |521066 |  
>|ok |
> ++

This is right for vm4 (thin pool is 511 GB) with thin provisionned :
  --- Logical volume ---
  LV Namedrbdthinpool
  LV Size511.00 GiB
  Allocated pool data0.00%
  Allocated metadata 0.43%

but can't be considered as right for vm5 :
  --- Logical volume ---
  LV Namedrbdthinpool
  LV Size511.00 GiB
  Allocated pool data1.96%
  Allocated metadata 1.43%

AFAIK, drbdmanage should report 511*1024*(1-1.96/100)= 513008 MB free. (almost
my 10GB disk size from 523264 MB).

Now, I created an 5GB disk :
> ++
> | Name | Pool Size | Pool Free |  
>| State |
> ||
> | vm4  |523264 |521013 |  
>|ok |
> | vm5  |523264 |505525 |  
>|ok |
> ++

Wow, so that WAS updated, even before the initial sync finish.

Let's wait for sync to finish and delete this new 5GB disk : still the same 
report.

So I'm wondering how and when 'pool free' value is calculated. Is it
recalculated only when a new ressource is created ? deleted ?

Is there a way to force it to rescan free space ? (with a dbus command perhaps 
?)

That could perhaps be done by a cron job running at defined frequency ?

Regards,
Julien



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdmanage : error in reporting free space with thin-lvm ?

2016-01-25 Thread Julien Escario
Le 23/01/2016 09:25, Roland Kammerer a écrit :
> On Fri, Jan 22, 2016 at 07:48:25PM +0100, Julien Escario wrote:
>> Seems I found a anwser : I was using drbdmanage 0.91 with a thin lv but this
>> version is using drbdmanage.storage.lvm.Lvm as default plugin.
>>
>> I'm now wondering how I can change the default plugin BEFORE initializing 
>> the nodes.
> 
> The basic idea is to get rid of all node-specific configurations as much
> as possible. drbdmanage is also used in dynamic openstack clouds with
> many nodes, where node specific configurations do not scale. Updating
> configuration files (and keeping them consistent) is error prone and it
> was one of the goals of drbdmanage to avoid that for drbd resource
> configuration files. So you usually "init" one, "edit-config" and set
> the storage plugin in the global section. Done, and the values are
> stored in the control volume and consistent for the whole cluster.

From my point of view, that seems absolutely normal. You're building a highly
available network storage, not a storage plugin for an hypervisor.

> If you are in a closed universe like proxmox, patching the source is a
> legitimate hack. In general we try to "protect" (patronize if you will)
> the user from running into problems, which would happen _for_ _sure_ if
> we allow too many node specific settings in /etc/drbdmanaged.cfg.

Yup, but it only seems to simplify deployment of drbd9 with proxmox4 just to
remove the need to edit-config before adding nodes.

I just retried with package python-drbdmanage 0.91 and everything seems fine if
we precise the plugin after init and before adding nodes.
It more clear with my sense of 'best practice'.

But I still have caveats with values given by drbdmanage list-nodes with thinlv
plugin. Let's clarify this in another thread.

Julien




smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdmanage : error in reporting free space with thin-lvm ?

2016-01-25 Thread Julien Escario
Le 23/01/2016 08:16, Dietmar Maurer a écrit :
>> Seems I found a anwser : I was using drbdmanage 0.91 with a thin lv but this
>> version is using drbdmanage.storage.lvm.Lvm as default plugin.
>>
>> I'm now wondering how I can change the default plugin BEFORE initializing the
>> nodes.
> 
> The lvmthin driver is the default if you use the proxmox drbdmanage 
> package (v0.91 is on pvetest).
> 
> We currently use this patch to change the default for our needs:
> 
> 
> Index: new/drbdmanage/server.py
> ===
> --- new.orig/drbdmanage/server.py
> +++ new/drbdmanage/server.py
> @@ -147,7 +147,7 @@ class DrbdManageServer(object):
>  
>  # defaults
>  CONF_DEFAULTS = {
> -KEY_STOR_NAME  : "drbdmanage.storage.lvm.Lvm",
> +KEY_STOR_NAME  : "drbdmanage.storage.lvm_thinlv.LvmThinLv",
>  KEY_DEPLOYER_NAME  : "drbdmanage.deployers.BalancedDeployer",
>  KEY_MAX_NODE_ID: str(DEFAULT_MAX_NODE_ID),
>  KEY_MAX_PEERS  : str(DEFAULT_MAX_PEERS),

Ok, that's the explaination. I'll try the pve-test package and send a feedback
to proxmox-users list.

Thanks,
Julien Escario




smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdmanage : error in reporting free space with thin-lvm ?

2016-01-22 Thread Julien Escario
Seems I found a anwser : I was using drbdmanage 0.91 with a thin lv but this
version is using drbdmanage.storage.lvm.Lvm as default plugin.

I'm now wondering how I can change the default plugin BEFORE initializing the 
nodes.

Regards,
Julien Escario

Le 22/01/2016 11:53, Julien Escario a écrit :
> Hello,
> Today, I've been stuck in creation a new volumes (with proxmox) on a 2 nodes
> cluster because it reports "Not enough free space".
> 
> I checked a few settings on the machines :
>> # drbdmanage list-nodes
>> ++
>> | Name | Pool Size | Pool Free | 
>> | State |
>> ||
>> | vm4  |976632 | 22784 | 
>> |ok |
>> | vm5  |953864 |16 | 
>> |ok |
>> ++
> 
> (same on both nodes, copies set to 2).
> 
> So drbdmanage reports 16MB free on vm5 which is really strange because I only
> have on lv of 100GB on the thinpool :
>> # lvs
>>   LV   VG   Attr   LSize   Pool Origin Data%  
>> Meta%  Move Log Cpy%Sync Convert
>>   .drbdctrl_0  drbdpool -wi-ao   4.00m   
>>  
>>   .drbdctrl_1  drbdpool -wi-ao   4.00m   
>>  
>>   drbdthinpool drbdpool twi-aotz-- 931.25g 10.74  
>> 5.68
>>   vm-100-disk-1_00 drbdpool Vwi-aotz-- 100.02g drbdthinpool100.00
>>  
> 
> So 100GB on a 931,25GB thinpool takes all the place ? Where did my free space 
> go ?
> 
> Investigating a bit further shows that drbdmanage reports free space on the 
> VG :
>> # vgs
>> VG #PV #LV #SN Attr VSize VFree
>> drbdpool 1 4 0 wz--n- 931.51g 16.00m
> 
> Could this be a bug in free space reporting by drbdmanage with thin-lvm 
> backend
> or am I missing something ?
> 
> Let continue :
> I can see that I still have deleted volumes and ressources :
>> # drbdmanage list-resources
>> ++
>> | Name  |
>>   |   State |
>> ||
>> | vm-100-disk-1 |
>>   |  ok |
>> | vm-101-disk-1 |
>>   | pending actions: remove |
>> | vm-101-disk-2 |
>>   |  ok |
>> | vm-109-disk-1 |
>>   |  ok |
>> ++
> 
> But removing them doesn't change a thing :
>> # drbdmanage delete-resource vm-109-disk-1
>> You are going to remove the resource 'vm-109-disk-1' and all of its volumes 
>> from all nodes of the cluster.
>> Please confirm:
>> yes/no: yes
>> Operation completed successfully
>> # drbdmanage list-nodes
>> +--+
>> | Name | Pool Size | Pool Free | 
>>   | State |
>> |--|
>> | vm4  |976632 | 22784 | 
>>   |ok |
>> | vm5  |953864 |16 | 
>>   |ok |
>> +--+
> 
> Any idea of what could cause this ? Perhaps the pending action ? Is there a 
> way
> to force this ?
> 
> Best regards,
> Julien
> 
> 
> 
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 




smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD9 not syncing permanently??

2016-01-22 Thread Julien Escario
Le 21/01/2016 11:59, Rudolf Kasper a écrit :
> Hi,
> 
> i got a question. We've got a setup with three nics. One of them is cross-over
> and only for drbd use. So i expect that we can replicate 120M/s over this nic
> constantly. But when i transfer some files to the drbd device i see sometimes
> traffic on the device and sometimes not.  I got a adaptec 6405 and 
> write-caching
> is disabled so this couldn't be the clue. OS is debian 8. When i'm looking 
> into
> iotop i can see total disk write, but only sometimes there is a peak of actual
> disk write, after this peak the nic generates traffic.
> 
> Is there a disk-cache running? How can i check this and disable it?

You also have a cache on OS level. Perhaps some kind of delayed write ?

Di you tried to run :
# sync
on the copying node ?

You can also make a test with :
# dd if=/dev/zero of=/drbdvolume/testfile bs=1G count=1 oflag=direct

Of course, change /drbdvolume/ with mountpoint of your drbd device.

Julien



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] drbdmanage : error in reporting free space with thin-lvm ?

2016-01-22 Thread Julien Escario
Hello,
Today, I've been stuck in creation a new volumes (with proxmox) on a 2 nodes
cluster because it reports "Not enough free space".

I checked a few settings on the machines :
> # drbdmanage list-nodes
> ++
> | Name | Pool Size | Pool Free |  
>| State |
> ||
> | vm4  |976632 | 22784 |  
>|ok |
> | vm5  |953864 |16 |  
>|ok |
> ++

(same on both nodes, copies set to 2).

So drbdmanage reports 16MB free on vm5 which is really strange because I only
have on lv of 100GB on the thinpool :
> # lvs
>   LV   VG   Attr   LSize   Pool Origin Data%  
> Meta%  Move Log Cpy%Sync Convert
>   .drbdctrl_0  drbdpool -wi-ao   4.00m
> 
>   .drbdctrl_1  drbdpool -wi-ao   4.00m
> 
>   drbdthinpool drbdpool twi-aotz-- 931.25g 10.74  
> 5.68
>   vm-100-disk-1_00 drbdpool Vwi-aotz-- 100.02g drbdthinpool100.00 
> 

So 100GB on a 931,25GB thinpool takes all the place ? Where did my free space 
go ?

Investigating a bit further shows that drbdmanage reports free space on the VG :
> # vgs
> VG #PV #LV #SN Attr VSize VFree
> drbdpool 1 4 0 wz--n- 931.51g 16.00m

Could this be a bug in free space reporting by drbdmanage with thin-lvm backend
or am I missing something ?

Let continue :
I can see that I still have deleted volumes and ressources :
> # drbdmanage list-resources
> ++
> | Name  | 
>  |   State |
> ||
> | vm-100-disk-1 | 
>  |  ok |
> | vm-101-disk-1 | 
>  | pending actions: remove |
> | vm-101-disk-2 | 
>  |  ok |
> | vm-109-disk-1 | 
>  |  ok |
> ++

But removing them doesn't change a thing :
> # drbdmanage delete-resource vm-109-disk-1
> You are going to remove the resource 'vm-109-disk-1' and all of its volumes 
> from all nodes of the cluster.
> Please confirm:
> yes/no: yes
> Operation completed successfully
> # drbdmanage list-nodes
> +--+
> | Name | Pool Size | Pool Free |  
>  | State |
> |--|
> | vm4  |976632 | 22784 |  
>  |ok |
> | vm5  |953864 |16 |  
>  |ok |
> +--+

Any idea of what could cause this ? Perhaps the pending action ? Is there a way
to force this ?

Best regards,
Julien



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Multi-tiering with drbdmanage ?

2016-01-22 Thread Julien Escario
Le 19/01/2016 10:57, Roland Kammerer a écrit :
> On Tue, Jan 19, 2016 at 10:43:11AM +0100, Julien Escario wrote:
>> If not possible now, is it a planned feature ?
> 
> Not possible now, but on the roadmap (for > 1.0.0). 1.0.0 should be out
> soon.
> 
> Regards, rck

Hello,
Thanks. So this is a planned feature, great !

Regards,
Julien





smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Multi-tiering with drbdmanage ?

2016-01-19 Thread Julien Escario
Hello,
We're extensively trying drbdmanage. We're still using v0.5 but considering
trying 0.9 soon.

Just a quick question about multi-tiering : is it possible to manage 2 different
pools (one with HDD and SSD) with drbdmanage ? It's mainly about having 2
different LVM vgs as backend devices and beeing able to switch between them when
creating a ressource.

If not possible now, is it a planned feature ?

Regards,
Julien



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD9 Split-brain : primary goes to outdated ?

2015-12-08 Thread Julien Escario
Hello,
I'm currently trying some setups with DRBD9.


Today, I simulated a network failure between two nodes in protocol A with two
ressources created using DRBDmanage 0.5.
For this, I disabled and re-enabled a switch port on NodeB.

Both ressources were sync and only NodeA was primary. NodeB was secondary :
vm-102-disk-1 role:Primary
  disk:UpToDate
  NodeB role:Secondary
peer-disk:UpToDate

vm-102-disk-2 role:Primary
  disk:UpToDate
  NodeB role:Secondary
peer-disk:UpToDate

My problem is that ressources on NodeA (the primary) became outdated and
ressources on NodeB were primary when the disconnection occured.

I think this is not the expected behavior, right ? (perhaps not for an unknown -
to me - reason).

Below some debug infos.

Thanks for your advices,
Julien

Immediately after disconnect on NodeA :
# drbdadm status
.drbdctrl role:Secondary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate
  NodeB role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate

vm-102-disk-1 role:Primary
  disk:Outdated
  NodeB connection:StandAlone

vm-102-disk-2 role:Primary
  disk:Outdated
  NodeB connection:StandAlone

And on NodeB :
# drbdadm status
.drbdctrl role:Secondary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate
  NodeA role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate

vm-102-disk-1 role:Secondary
  disk:UpToDate
  NodeA connection:StandAlone

vm-102-disk-2 role:Secondary
  disk:UpToDate
  NodeA connection:StandAlone


Full log about ressource vm-102-disk-1 (including resync) can be found here :
http://pastebin.com/L2SgaKGs



smime.p7s
Description: Signature cryptographique S/MIME
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD module crash with KVM avec LVM

2015-02-18 Thread Julien Escario

Hello,
We are currently experiencing a strange problem with DRBD :
A few days ago, we got a crash of the drbd with :
Jan 21 12:13:40 dedie58 ntpd[926790]: ntp engine ready
Jan 21 12:13:41 dedie58 kernel: block drbd1: drbd1_receiver[2910] Concurrent 
local write detected!new: 2253364088s +4096; pending: 2253364088s +4096
Jan 21 12:13:41 dedie58 kernel: block drbd1: Concurrent write! [W AFTERWARDS] 
sec=2253364088s


It happened on node1 just after I synced time on both nodes (yeah, I won't 
repeat this). At this time, we were having VMs running dispatched on both nodes 
for a few days. No VM was running on both node at the same time.


So we rebooted node1, launched all VMs on node2 and asked for a full resync of 
the DRBD device which took 5 days (disks are 7k2 of 4 TB).


So we tought everything was back to normal but and we moved back a non-important 
VM to node1. It ran as expected for about 8 hours and finally, VM crashed around 
6h30 PM with the below call trace.


I checked about fencing, nothing in logs on any node.

For the background :
Two Proxmox 3 hypervisors running Linux KVM VMs with LVM disks over a DRBD over 
a software RAID device.


For the versions :

# cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted, 
2012-10-09 12:47:51


 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-
ns:0 nr:286670783 dw:286670763 dr:8447304 al:0 bm:31 lo:2 pe:0 ua:3 ap:0 
ep:2 wo:b oos:0


It seems we lost our VM a few mintues before :

Feb 13 06:36:23 dedie58 pvestatd[3883]: WARNING: unable to connect to VM 106 
socket - timeout after 31 retries


Nothing else that seems relevant in logs, only cron tasks and they're run every 
hour so I don't thing this has something to do with.


The hour is giving me a hint : the VM is a debian one and daily cron jobs are 
running around 06:30 AM. This is the only clue I actually have and this can't 
really explain this crash.


Does someone have any idea about what could cause this ?
Of course, I can make some tests by moving back the VM to node1 and with more 
log activated.


Thanks for your reading and help,
Julien Escario

And finally the call trace :

Feb 13 06:38:27 dedie58 kernel: INFO: task kvm:820630 blocked for more than 120 
seconds.
Feb 13 06:38:27 dedie58 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 13 06:38:27 dedie58 kernel: kvm   D 88186ad84740 0 820630   
   10 0x
Feb 13 06:38:27 dedie58 kernel: 8817ebe255f8 0086 
 00051200
Feb 13 06:38:27 dedie58 kernel: 00011200 00011210 
8817ebe25578 0010
Feb 13 06:38:27 dedie58 kernel: 881873256a80 00013232ac6f 
8817ebe25fd8 8817ebe25fd8
Feb 13 06:38:27 dedie58 kernel: Call Trace:
Feb 13 06:38:27 dedie58 kernel: [] 
drbd_al_begin_io+0x195/0x220 [drbd]
Feb 13 06:38:27 dedie58 kernel: [] ? 
autoremove_wake_function+0x0/0x40
Feb 13 06:38:27 dedie58 kernel: [] ? __bio_clone+0x26/0x70
Feb 13 06:38:27 dedie58 kernel: [] 
drbd_make_request_common+0x1298/0x1870 [drbd]
Feb 13 06:38:27 dedie58 kernel: [] ? 
mempool_alloc_slab+0x15/0x20
Feb 13 06:38:27 dedie58 kernel: [] ? mempool_alloc+0x73/0x180
Feb 13 06:38:27 dedie58 kernel: [] ? bvec_alloc_bs+0x6a/0x120
Feb 13 06:38:27 dedie58 kernel: [] ? 
bio_alloc_bioset+0xb2/0xf0
Feb 13 06:38:27 dedie58 kernel: [] 
drbd_make_request+0x4ba/0x12a0 [drbd]
Feb 13 06:38:27 dedie58 kernel: [] ? 
__split_and_process_bio+0x47f/0x600
Feb 13 06:38:27 dedie58 kernel: [] ? 
md_make_request+0xcd/0x1f0
Feb 13 06:38:27 dedie58 kernel: [] ? throtl_find_tg+0x44/0x60
Feb 13 06:38:27 dedie58 kernel: [] ? 
blk_throtl_bio+0x3ce/0x5c0
Feb 13 06:38:27 dedie58 kernel: [] 
generic_make_request+0x265/0x3b0
Feb 13 06:38:27 dedie58 kernel: [] submit_bio+0x8d/0x1a0
Feb 13 06:38:27 dedie58 kernel: [] dio_bio_submit+0xa8/0xc0
Feb 13 06:38:27 dedie58 kernel: [] 
__blockdev_direct_IO_newtrunc+0x98a/0xce0
Feb 13 06:38:27 dedie58 kernel: [] 
__blockdev_direct_IO+0x5c/0xd0
Feb 13 06:38:27 dedie58 kernel: [] ? 
blkdev_get_blocks+0x0/0xd0
Feb 13 06:38:27 dedie58 kernel: [] blkdev_direct_IO+0x57/0x60
Feb 13 06:38:27 dedie58 kernel: [] ? 
blkdev_get_blocks+0x0/0xd0
Feb 13 06:38:27 dedie58 kernel: [] 
mapping_direct_IO.isra.25+0x48/0x70
Feb 13 06:38:27 dedie58 kernel: [] 
generic_file_direct_write_iter+0xef/0x170
Feb 13 06:38:27 dedie58 kernel: [] 
__generic_file_write_iter+0x320/0x3e0
Feb 13 06:38:27 dedie58 kernel: [] ? 
kvm_arch_vcpu_put+0x4b/0x60 [kvm]
Feb 13 06:38:27 dedie58 kernel: [] ? vcpu_put+0x28/0x30 [kvm]
Feb 13 06:38:27 dedie58 kernel: [] 
__generic_file_aio_write+0x83/0xa0
Feb 13 06:38:27 dedie58 kernel: [] blkdev_aio_write+0x71/0x100
Feb 13 06:38:27 dedie58 kernel: [] 
aio_rw_vect_retry+0xa7/0x240
Feb 13 06:38:27 dedie58 kernel: [] aio_run_iocb+0x61/0x150
Feb 13 06:38:27 dedie58 kernel: [] do_io_submit+0x2a0/0x630
Feb 13 06:38:27

Re: [DRBD-user] multiple DRBD + solid state drive + 10G Ethernet performance tuning. Help!!

2010-10-29 Thread Julien Escario

I asked for the same thing (without SSD) a few weeks ago.

Someone answered me that these preformances are perfectly normal is dual master 
configuration.
Seems to be due to the network latency (first server I/O + network latency + 
second server I/O + network latency (ACK))


I finally decided that DRBD is unusable in dual primary setup because the 
performance drop.


Julien

Le 29/10/2010 18:09, wang xuchen a écrit :

Hi all,

I have encountered a DRBD write performance bottleneck issue.

According to DRBD specification "DRBD then reduces that throughput maximum by
its additional throughput overhead, which can be expected to beless than 3 
percent."

My current test environment is:

(1) Hard-drive:  300G SSD with 8 partitions on it, each of which has a DRBD
device create on top it. I use dd utility to test its performance: 97 MB/s with
4k block size.


(2) netowork: dedicated 10G ethernet card for data replication:
ethtool eth2
Settings for eth2:
...
 Speed: 1Mb/s
...

(3) DRBD configuration: (Here is one of them).

 on Server1 {
 device   /dev/drbd3 minor 3;
 disk /dev/fioa3;
 address  ipv4 192.168.202.107:7793 
;
 meta-diskinternal;
 }
 on NSS_108 {
 device   /dev/drbd3 minor 3;
 disk /dev/fioa3;
 address  ipv4 192.168.202.108:7793 
;
 meta-diskinternal;
 }
 net {
 allow-two-primaries;
 after-sb-0pridiscard-zero-changes;
 after-sb-1priconsensus;
 after-sb-2pricall-pri-lost-after-sb;
 rr-conflict  disconnect;
 max-buffers  4000;
 max-epoch-size   16000;
 unplug-watermark 4000;
 sndbuf-size   2M;
 data-integrity-alg crc32c;
 }
 syncer {
 rate 300M;
 csums-algmd5;
 verify-alg   crc32c;
 al-extents   3800;
 cpu-mask   2;
 }
}

(4) Test result:

I have a simple script which use multiple instance of dd to their corresponding
DRBD device

dd if=/dev/zero of=/dev/drbd1 bs=4k count=1 oflag=direct &


For one device, I got roughly 8M/s. As the test goes, I increase the number of
device to see if it helps the performance. Unfortunately, as the number of
device grows, performance seems to be distributed on each of the device with the
total add up to 10M/s.

Can somebody give me a hint on what was going wrong?

Many Thanks.
Ben


Commit yourself to constant self-improvement

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Performance issue

2010-05-14 Thread Julien Escario

Hello,
I'm trying to optimize write performances on a primary/primary drbd cluster.

The nodes are connected directly with a dedicated gigabit network.

To get rid of filesystem perfs, I'm adressing the drbd ressource directly.

Here's what dd gives me (I'm using one big chuck to get rid of latency) :

# dd if=/dev/zero of=/dev/drbd0 bs=512M count=1 oflag=direct
1+0 enregistrements lus
1+0 enregistrements écrits
536870912 octets (537 MB) copiés, 30,3065 seconde, 17,7 MB/s

Now, I disconnect the ressource :
# drbdadm disconnect r0

And I have very better perfs :
# dd if=/dev/zero of=/dev/drbd0 bs=512M count=1 oflag=direct
1+0 enregistrements lus
1+0 enregistrements écrits
536870912 octets (537 MB) copiés, 8,6145 seconde, 62,3 MB/s

When syncing, I get :
finish: 3:07:23 speed: 61,240 (55,320) K/sec

Which seems around the real performance.

The version is :
version: 8.0.16 (api:86/proto:86)

Do you have any clue about what is giving me this 3.5 factor ?

What other information can I give you ?

Thanks for your help,
Julien Escario
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user