[ceph-users] Operations: cannot update immutable features

2023-06-08 Thread Adam Boyhan
I have a small cluster on Pacific with roughly 600 RBD images.   Out of those 
600 images I have 2 which are in a somewhat odd state.

root@cephmon:~# rbd info Cloud-Ceph1/vm-134-disk-0
rbd image 'vm-134-disk-0':
size 1000 GiB in 256000 objects
order 22 (4 MiB objects)
snapshot_count: 11
id: 7c326b8b4567
block_name_prefix: rbd_data.7c326b8b4567
format: 2
features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, operations
op_features: snap-trash
flags:
create_timestamp: Fri Aug 14 07:11:44 2020
access_timestamp: Thu Jun  8 06:31:06 2023
modify_timestamp: Thu Jun  8 06:31:11 2023

Specifically, the feature "operations".   I never set this feature; and I don't 
see it listed in documentation.

This feature is causing my backup software to not be able to backup the rdb.  
Otherwise, it's working aok.

I did attempt to remove the feature.

root@cephmon:~# rbd feature disable Cloud-Ceph1/vm-464-disk-0 operations
rbd: failed to update image features: 2023-06-08T07:50:21.899-0400 7fdea52ae340 
-1 librbd::Operations: cannot update immutable features
(22) Invalid argument

Any help or input is great appreciated.
This message and any attachments may contain information that is protected by 
law as privileged and confidential, and is transmitted for the sole use of the 
intended recipient(s). If you are not the intended recipient, you are hereby 
notified that any use, dissemination, copying or retention of this e-mail or 
the information contained herein is strictly prohibited. If you received this 
e-mail in error, please immediately notify the sender by e-mail, and 
permanently delete this e-mail.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD-Mirror Snapshot Backup Image Uses

2021-01-20 Thread Adam Boyhan
I have been doing some testing with RBD-Mirror Snapshots to a remote Ceph 
cluster. 

Does anyone know if the images on the remote cluster can be utilized in anyway? 
Would love the ability to clone them, or even readonly would be nice. 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-20 Thread Adam Boyhan
Two separate 4 node clusters with 10 OSD's in each node. Micron 9300 NVMe's are 
the OSD drives. Heavily based on the Micron/Supermicro white papers. 

When I attempt to protect the snapshot on a remote image, it errors with read 
only. 

root@Bunkcephmon2:~# rbd snap protect CephTestPool1/vm-100-disk-0@TestSnapper1 
rbd: protecting snap failed: (30) Read-only file system 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-20 Thread Adam Boyhan
That's what I though as well, specially based on this. 



Note 

You may clone a snapshot from one pool to an image in another pool. For 
example, you may maintain read-only images and snapshots as templates in one 
pool, and writeable clones in another pool. 

root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
CephTestPool2/vm-100-disk-0-CLONE 
2021-01-20T15:06:35.854-0500 7fb889ffb700 -1 librbd::image::CloneRequest: 
0x55c7cf8417f0 validate_parent: parent snapshot must be protected 

root@Bunkcephmon2:~# rbd snap protect CephTestPool1/vm-100-disk-0@TestSnapper1 
rbd: protecting snap failed: (30) Read-only file system 


From: "Eugen Block"  
To: "adamb"  
Cc: "ceph-users" , "Matt Wilder"  
Sent: Wednesday, January 20, 2021 3:00:54 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

But you should be able to clone the mirrored snapshot on the remote 
cluster even though it’s not protected, IIRC. 


Zitat von Adam Boyhan : 

> Two separate 4 node clusters with 10 OSD's in each node. Micron 9300 
> NVMe's are the OSD drives. Heavily based on the Micron/Supermicro 
> white papers. 
> 
> When I attempt to protect the snapshot on a remote image, it errors 
> with read only. 
> 
> root@Bunkcephmon2:~# rbd snap protect 
> CephTestPool1/vm-100-disk-0@TestSnapper1 
> rbd: protecting snap failed: (30) Read-only file system 
> ___ 
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-20 Thread Adam Boyhan
Awesome information. I new I had to be missing something. 

All of my clients will be far newer than mimic so I don't think that will be an 
issue. 

Added the following to my ceph.conf on both clusters. 

rbd_default_clone_format = 2 

root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
CephTestPool2/vm-100-disk-0-CLONE 
root@Bunkcephmon2:~# rbd ls CephTestPool2 
vm-100-disk-0-CLONE 

I am sure I will be back with more questions. Hoping to replace our Nimble 
storage with Ceph and NVMe. 

Appreciate it! 


From: "Jason Dillaman"  
To: "adamb"  
Cc: "Eugen Block" , "ceph-users" , "Matt 
Wilder"  
Sent: Wednesday, January 20, 2021 3:28:39 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Wed, Jan 20, 2021 at 3:10 PM Adam Boyhan  wrote: 
> 
> That's what I though as well, specially based on this. 
> 
> 
> 
> Note 
> 
> You may clone a snapshot from one pool to an image in another pool. For 
> example, you may maintain read-only images and snapshots as templates in one 
> pool, and writeable clones in another pool. 
> 
> root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool2/vm-100-disk-0-CLONE 
> 2021-01-20T15:06:35.854-0500 7fb889ffb700 -1 librbd::image::CloneRequest: 
> 0x55c7cf8417f0 validate_parent: parent snapshot must be protected 
> 
> root@Bunkcephmon2:~# rbd snap protect 
> CephTestPool1/vm-100-disk-0@TestSnapper1 
> rbd: protecting snap failed: (30) Read-only file system 

You have two options: (1) protect the snapshot on the primary image so 
that the protection status replicates or (2) utilize RBD clone v2 
which doesn't require protection but does require Mimic or later 
clients [1]. 

> 
> From: "Eugen Block"  
> To: "adamb"  
> Cc: "ceph-users" , "Matt Wilder"  
> Sent: Wednesday, January 20, 2021 3:00:54 PM 
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> But you should be able to clone the mirrored snapshot on the remote 
> cluster even though it’s not protected, IIRC. 
> 
> 
> Zitat von Adam Boyhan : 
> 
> > Two separate 4 node clusters with 10 OSD's in each node. Micron 9300 
> > NVMe's are the OSD drives. Heavily based on the Micron/Supermicro 
> > white papers. 
> > 
> > When I attempt to protect the snapshot on a remote image, it errors 
> > with read only. 
> > 
> > root@Bunkcephmon2:~# rbd snap protect 
> > CephTestPool1/vm-100-disk-0@TestSnapper1 
> > rbd: protecting snap failed: (30) Read-only file system 
> > ___ 
> > ceph-users mailing list -- ceph-users@ceph.io 
> > To unsubscribe send an email to ceph-users-le...@ceph.io 
> ___ 
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 

[1] https://ceph.io/community/new-mimic-simplified-rbd-image-cloning/ 

-- 
Jason 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD-Mirror Mirror Snapshot stuck

2021-01-21 Thread Adam Boyhan
I have a rbd-mirror snapshot on 1 image that failed to replicate and now its 
not getting cleaned up. 

The cause of this was my fault based on my steps. Just trying to understand how 
to clean up/handle the situation. 

Here is how I got into this situation. 

- Created manual rbd snapshot on the image 
- On the remote cluster I cloned the snapshot 
- While cloned on the secondary cluster I made the mistake of deleting the 
snapshot on the primary 
- The subsequent mirror snapshot failed 
- I then removed the clone 
- The next mirror snapshot was successful but I was left with this mirror 
snapshot on the primary that I can't seem to get rid of 

root@Ccscephtest1:/var/log/ceph# rbd snap ls --all CephTestPool1/vm-100-disk-0 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
10082 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.e0c63479-b09e-4c66-a65b-085b67a19907
 2 TiB Thu Jan 21 07:10:09 2021 mirror (primary peer_uuids:[]) 
10243 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.483e55aa-2f64-4bb0-ac0f-7b5aac59830e
 2 TiB Thu Jan 21 07:30:08 2021 mirror (primary 
peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770]) 

I have tried deleting the snap with "rbd snap rm" like normal user created 
snaps, but no luck. Anyway to force the deletion? 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-21 Thread Adam Boyhan
When cloning the snapshot on the remote cluster I can't see my ext4 filesystem. 

Using the same exact snapshot on both sides. Shouldn't this be consistent? 

Primary Site 
root@Ccscephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
TestSnapper1 
10621 TestSnapper1 2 TiB Thu Jan 21 08:15:22 2021 user 

root@Ccscephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
CephTestPool1/vm-100-disk-0-CLONE 
root@Ccscephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id admin 
--keyring /etc/ceph/ceph.client.admin.keyring 
/dev/nbd0 
root@Ccscephtest1:~# mount /dev/nbd0 /usr2 

Secondary Site 
root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
TestSnapper1 
10430 TestSnapper1 2 TiB Thu Jan 21 08:20:08 2021 user 

root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
CephTestPool1/vm-100-disk-0-CLONE 
root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id admin 
--keyring /etc/ceph/ceph.client.admin.keyring 
/dev/nbd0 
root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 
mount: /usr2: wrong fs type, bad option, bad superblock on /dev/nbd0, missing 
codepage or helper program, or other error. 




From: "adamb"  
To: "dillaman"  
Cc: "Eugen Block" , "ceph-users" , "Matt 
Wilder"  
Sent: Wednesday, January 20, 2021 3:42:46 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

Awesome information. I new I had to be missing something. 

All of my clients will be far newer than mimic so I don't think that will be an 
issue. 

Added the following to my ceph.conf on both clusters. 

rbd_default_clone_format = 2 

root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
CephTestPool2/vm-100-disk-0-CLONE 
root@Bunkcephmon2:~# rbd ls CephTestPool2 
vm-100-disk-0-CLONE 

I am sure I will be back with more questions. Hoping to replace our Nimble 
storage with Ceph and NVMe. 

Appreciate it! 


From: "Jason Dillaman"  
To: "adamb"  
Cc: "Eugen Block" , "ceph-users" , "Matt 
Wilder"  
Sent: Wednesday, January 20, 2021 3:28:39 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Wed, Jan 20, 2021 at 3:10 PM Adam Boyhan  wrote: 
> 
> That's what I though as well, specially based on this. 
> 
> 
> 
> Note 
> 
> You may clone a snapshot from one pool to an image in another pool. For 
> example, you may maintain read-only images and snapshots as templates in one 
> pool, and writeable clones in another pool. 
> 
> root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool2/vm-100-disk-0-CLONE 
> 2021-01-20T15:06:35.854-0500 7fb889ffb700 -1 librbd::image::CloneRequest: 
> 0x55c7cf8417f0 validate_parent: parent snapshot must be protected 
> 
> root@Bunkcephmon2:~# rbd snap protect 
> CephTestPool1/vm-100-disk-0@TestSnapper1 
> rbd: protecting snap failed: (30) Read-only file system 

You have two options: (1) protect the snapshot on the primary image so 
that the protection status replicates or (2) utilize RBD clone v2 
which doesn't require protection but does require Mimic or later 
clients [1]. 

> 
> From: "Eugen Block"  
> To: "adamb"  
> Cc: "ceph-users" , "Matt Wilder"  
> Sent: Wednesday, January 20, 2021 3:00:54 PM 
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> But you should be able to clone the mirrored snapshot on the remote 
> cluster even though it’s not protected, IIRC. 
> 
> 
> Zitat von Adam Boyhan : 
> 
> > Two separate 4 node clusters with 10 OSD's in each node. Micron 9300 
> > NVMe's are the OSD drives. Heavily based on the Micron/Supermicro 
> > white papers. 
> > 
> > When I attempt to protect the snapshot on a remote image, it errors 
> > with read only. 
> > 
> > root@Bunkcephmon2:~# rbd snap protect 
> > CephTestPool1/vm-100-disk-0@TestSnapper1 
> > rbd: protecting snap failed: (30) Read-only file system 
> > ___ 
> > ceph-users mailing list -- ceph-users@ceph.io 
> > To unsubscribe send an email to ceph-users-le...@ceph.io 
> ___ 
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 

[1] https://ceph.io/community/new-mimic-simplified-rbd-image-cloning/ 

-- 
Jason 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Mirror Snapshot stuck

2021-01-21 Thread Adam Boyhan
Decided to request a resync to see the results, I have a very aggressive 
snapshot mirror schedule of 5 minutes, replication just keeps starting on the 
latest snapshot before it finishes. Pretty sure this would just loop over and 
over if I don't remove the schedule. 

root@Ccscephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
10082 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.e0c63479-b09e-4c66-a65b-085b67a19907
 2 TiB Thu Jan 21 07:10:09 2021 mirror (primary peer_uuids:[]) 
10621 TestSnapper1 2 TiB Thu Jan 21 08:15:22 2021 user 
10883 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.7242f4d1-5203-4273-8b6d-ff4e1411216d
 2 TiB Thu Jan 21 08:50:08 2021 mirror (primary peer_uuids:[]) 
10923 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.d0c3c2e7-880b-4e62-90cc-fd501e9a87c9
 2 TiB Thu Jan 21 08:55:11 2021 mirror (primary 
peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770]) 
10963 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.655f7c17-2f85-42e5-9ffe-777a8a48dda3
 2 TiB Thu Jan 21 09:00:09 2021 mirror (primary peer_uuids:[]) 
10993 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.268b960c-51e9-4a60-99b4-c5e7c303fdd8
 2 TiB Thu Jan 21 09:05:25 2021 mirror (primary 
peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770]) 

I have removed the 5 minute schedule for now, but I don't think this should be 
expected behavior? 


From: "adamb"  
To: "ceph-users"  
Sent: Thursday, January 21, 2021 7:40:01 AM 
Subject: [ceph-users] RBD-Mirror Mirror Snapshot stuck 

I have a rbd-mirror snapshot on 1 image that failed to replicate and now its 
not getting cleaned up. 

The cause of this was my fault based on my steps. Just trying to understand how 
to clean up/handle the situation. 

Here is how I got into this situation. 

- Created manual rbd snapshot on the image 
- On the remote cluster I cloned the snapshot 
- While cloned on the secondary cluster I made the mistake of deleting the 
snapshot on the primary 
- The subsequent mirror snapshot failed 
- I then removed the clone 
- The next mirror snapshot was successful but I was left with this mirror 
snapshot on the primary that I can't seem to get rid of 

root@Ccscephtest1:/var/log/ceph# rbd snap ls --all CephTestPool1/vm-100-disk-0 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
10082 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.e0c63479-b09e-4c66-a65b-085b67a19907
 2 TiB Thu Jan 21 07:10:09 2021 mirror (primary peer_uuids:[]) 
10243 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.483e55aa-2f64-4bb0-ac0f-7b5aac59830e
 2 TiB Thu Jan 21 07:30:08 2021 mirror (primary 
peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770]) 

I have tried deleting the snap with "rbd snap rm" like normal user created 
snaps, but no luck. Anyway to force the deletion? 

___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-21 Thread Adam Boyhan
After the resync finished. I can mount it now. 

root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
CephTestPool1/vm-100-disk-0-CLONE 
root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id admin 
--keyring /etc/ceph/ceph.client.admin.keyring 
/dev/nbd0 
root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 

Makes me a bit nervous how it got into that position and everything appeared 
ok. 


From: "Jason Dillaman"  
To: "adamb"  
Cc: "Eugen Block" , "ceph-users" , "Matt 
Wilder"  
Sent: Thursday, January 21, 2021 9:25:11 AM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Thu, Jan 21, 2021 at 8:34 AM Adam Boyhan  wrote: 
> 
> When cloning the snapshot on the remote cluster I can't see my ext4 
> filesystem. 
> 
> Using the same exact snapshot on both sides. Shouldn't this be consistent? 

Yes. Has the replication process completed ("rbd mirror image status 
CephTestPool1/vm-100-disk-0")? 

> Primary Site 
> root@Ccscephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
> TestSnapper1 
> 10621 TestSnapper1 2 TiB Thu Jan 21 08:15:22 2021 user 
> 
> root@Ccscephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool1/vm-100-disk-0-CLONE 
> root@Ccscephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id admin 
> --keyring /etc/ceph/ceph.client.admin.keyring 
> /dev/nbd0 
> root@Ccscephtest1:~# mount /dev/nbd0 /usr2 
> 
> Secondary Site 
> root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
> TestSnapper1 
> 10430 TestSnapper1 2 TiB Thu Jan 21 08:20:08 2021 user 
> 
> root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool1/vm-100-disk-0-CLONE 
> root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id 
> admin --keyring /etc/ceph/ceph.client.admin.keyring 
> /dev/nbd0 
> root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 
> mount: /usr2: wrong fs type, bad option, bad superblock on /dev/nbd0, missing 
> codepage or helper program, or other error. 
> 
> 
> 
>  
> From: "adamb"  
> To: "dillaman"  
> Cc: "Eugen Block" , "ceph-users" , "Matt 
> Wilder"  
> Sent: Wednesday, January 20, 2021 3:42:46 PM 
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> Awesome information. I new I had to be missing something. 
> 
> All of my clients will be far newer than mimic so I don't think that will be 
> an issue. 
> 
> Added the following to my ceph.conf on both clusters. 
> 
> rbd_default_clone_format = 2 
> 
> root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool2/vm-100-disk-0-CLONE 
> root@Bunkcephmon2:~# rbd ls CephTestPool2 
> vm-100-disk-0-CLONE 
> 
> I am sure I will be back with more questions. Hoping to replace our Nimble 
> storage with Ceph and NVMe. 
> 
> Appreciate it! 
> 
>  
> From: "Jason Dillaman"  
> To: "adamb"  
> Cc: "Eugen Block" , "ceph-users" , "Matt 
> Wilder"  
> Sent: Wednesday, January 20, 2021 3:28:39 PM 
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> On Wed, Jan 20, 2021 at 3:10 PM Adam Boyhan  wrote: 
> > 
> > That's what I though as well, specially based on this. 
> > 
> > 
> > 
> > Note 
> > 
> > You may clone a snapshot from one pool to an image in another pool. For 
> > example, you may maintain read-only images and snapshots as templates in 
> > one pool, and writeable clones in another pool. 
> > 
> > root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> > CephTestPool2/vm-100-disk-0-CLONE 
> > 2021-01-20T15:06:35.854-0500 7fb889ffb700 -1 librbd::image::CloneRequest: 
> > 0x55c7cf8417f0 validate_parent: parent snapshot must be protected 
> > 
> > root@Bunkcephmon2:~# rbd snap protect 
> > CephTestPool1/vm-100-disk-0@TestSnapper1 
> > rbd: protecting snap failed: (30) Read-only file system 
> 
> You have two options: (1) protect the snapshot on the primary image so 
> that the protection status replicates or (2) utilize RBD clone v2 
> which doesn't require protection but does require Mimic or later 
> clients [1]. 
> 
> > 
> > From: "Eugen Block"  
> > To: "adamb"  
> > Cc: "ceph-users" , "Matt Wilder" 
> >  
> > Sent: Wednesday, January 20, 2021 3:00:54 PM 
> > Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
>

[ceph-users] RBD-Mirror Snapshot Scalability

2021-01-21 Thread Adam Boyhan
I have noticed that RBD-Mirror snapshot mode can only manage to take 1 snapshot 
per second. For example I have 21 images in a single pool. When the schedule is 
triggered it takes the mirror snapshot of each image 1 at a time. It doesn't 
feel or look like a performance issue as the OSD's are Micron 9300 PRO NVMe's 
and each server has 2x Intel Platinum 8268 CPU's. 

I was hoping that adding more RDB-Mirror instance would help, but that only 
seems to help with overall throughput. As it sits I have 3 RBD-Mirror instances 
running on each cluster. 

We run a 30 minute snapshot schedule to our remote site as it is, based on that 
I can only squeeze 1800 mirror snaps every 30 minutes. 

I was hoping there might be something I am missing with RBD-Mirror as far as 
scaling goes. 

Maybe multiple pools would be a solution and have other benefits? 




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-21 Thread Adam Boyhan
I was able to trigger the issue again. 

- On the primary I created a snap called TestSnapper for disk vm-100-disk-1 
- Allowed the next RBD-Mirror scheduled snap to complete 
- At this point the snapshot is showing up on the remote side. 

root@Bunkcephtest1:~# rbd mirror image status CephTestPool1/vm-100-disk-1 
vm-100-disk-1: 
global_id: a04e92df-3d64-4dc4-8ac8-eaba17b45403 
state: up+replaying 
description: replaying, 
{"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1611247200,"remote_snapshot_timestamp":1611247200,"replay_state":"idle"}
 
service: admin on Bunkcephmon1 
last_update: 2021-01-21 11:46:24 
peer_sites: 
name: ccs 
state: up+stopped 
description: local image is primary 
last_update: 2021-01-21 11:46:28 

root@Ccscephtest1:/etc/pve/priv# rbd snap ls --all CephTestPool1/vm-100-disk-1 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
11532 TestSnapper 2 TiB Thu Jan 21 11:21:25 2021 user 
11573 
.mirror.primary.a04e92df-3d64-4dc4-8ac8-eaba17b45403.9525e4eb-41c0-499c-8879-0c7d9576e253
 2 TiB Thu Jan 21 11:35:00 2021 mirror (primary 
peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770]) 

Seems like the sync is complete, So I then clone it, map it and attempt to 
mount it. 

root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-1@TestSnapper 
CephTestPool1/vm-100-disk-1-CLONE 
root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-1-CLONE --id admin 
--keyring /etc/ceph/ceph.client.admin.keyring 
/dev/nbd0 
root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 
mount: /usr2: wrong fs type, bad option, bad superblock on /dev/nbd0, missing 
codepage or helper program, or other error. 

On the primary still no issues 

root@Ccscephtest1:/etc/pve/priv# rbd clone 
CephTestPool1/vm-100-disk-1@TestSnapper CephTestPool1/vm-100-disk-1-CLONE 
root@Ccscephtest1:/etc/pve/priv# rbd-nbd map CephTestPool1/vm-100-disk-1-CLONE 
--id admin --keyring /etc/ceph/ceph.client.admin.keyring 
/dev/nbd0 
root@Ccscephtest1:/etc/pve/priv# mount /dev/nbd0 /usr2 






From: "Jason Dillaman"  
To: "adamb"  
Cc: "Eugen Block" , "ceph-users" , "Matt 
Wilder"  
Sent: Thursday, January 21, 2021 9:42:26 AM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Thu, Jan 21, 2021 at 9:40 AM Adam Boyhan  wrote: 
> 
> After the resync finished. I can mount it now. 
> 
> root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool1/vm-100-disk-0-CLONE 
> root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id 
> admin --keyring /etc/ceph/ceph.client.admin.keyring 
> /dev/nbd0 
> root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 
> 
> Makes me a bit nervous how it got into that position and everything appeared 
> ok. 

We unfortunately need to create the snapshots that are being synced as 
a first step, but perhaps there are some extra guardrails we can put 
on the system to prevent premature usage if the sync status doesn't 
indicate that it's complete. 

>  
> From: "Jason Dillaman"  
> To: "adamb"  
> Cc: "Eugen Block" , "ceph-users" , "Matt 
> Wilder"  
> Sent: Thursday, January 21, 2021 9:25:11 AM 
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> On Thu, Jan 21, 2021 at 8:34 AM Adam Boyhan  wrote: 
> > 
> > When cloning the snapshot on the remote cluster I can't see my ext4 
> > filesystem. 
> > 
> > Using the same exact snapshot on both sides. Shouldn't this be consistent? 
> 
> Yes. Has the replication process completed ("rbd mirror image status 
> CephTestPool1/vm-100-disk-0")? 
> 
> > Primary Site 
> > root@Ccscephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
> > TestSnapper1 
> > 10621 TestSnapper1 2 TiB Thu Jan 21 08:15:22 2021 user 
> > 
> > root@Ccscephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> > CephTestPool1/vm-100-disk-0-CLONE 
> > root@Ccscephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id 
> > admin --keyring /etc/ceph/ceph.client.admin.keyring 
> > /dev/nbd0 
> > root@Ccscephtest1:~# mount /dev/nbd0 /usr2 
> > 
> > Secondary Site 
> > root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
> > TestSnapper1 
> > 10430 TestSnapper1 2 TiB Thu Jan 21 08:20:08 2021 user 
> > 
> > root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> > CephTestPool1/vm-100-disk-0-CLONE 
> > root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id 
> > admin --keyring /etc/ceph/ceph.client.admin.keyring 
> > /dev/nbd0 
> > root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 
>

[ceph-users] Re: RBD-Mirror Snapshot Scalability

2021-01-21 Thread Adam Boyhan
Looks like a script and cron will be a solid work around. 

Still interested to know if there are any options to make it so rbd-mirror can 
take more than 1 mirror snap per second. 



From: "adamb"  
To: "ceph-users"  
Sent: Thursday, January 21, 2021 11:18:36 AM 
Subject: [ceph-users] RBD-Mirror Snapshot Scalability 

I have noticed that RBD-Mirror snapshot mode can only manage to take 1 snapshot 
per second. For example I have 21 images in a single pool. When the schedule is 
triggered it takes the mirror snapshot of each image 1 at a time. It doesn't 
feel or look like a performance issue as the OSD's are Micron 9300 PRO NVMe's 
and each server has 2x Intel Platinum 8268 CPU's. 

I was hoping that adding more RDB-Mirror instance would help, but that only 
seems to help with overall throughput. As it sits I have 3 RBD-Mirror instances 
running on each cluster. 

We run a 30 minute snapshot schedule to our remote site as it is, based on that 
I can only squeeze 1800 mirror snaps every 30 minutes. 

I was hoping there might be something I am missing with RBD-Mirror as far as 
scaling goes. 

Maybe multiple pools would be a solution and have other benefits? 




___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Scalability

2021-01-21 Thread Adam Boyhan
Let me just start off by saying, I really appreciate all your input so far. Its 
been a huge help! 

Even if it can scale to 10-20 per second that would make things far more 
viable. Sounds like it shouldn't be much of a issue with the changes you 
mentioned. 

As it sits we have roughly 1300 (and growing) LVM Logical Volumes that we would 
like to move over to Ceph and replicate. These are currently running on one 
large Nimble HPE volume. 

For testing I currently have 2x of the OSD nodes I plan on using in production. 

SuperMicro SYS-1029U-TN10RT 
2x Platinum 8268 2.9Ghz CPU 
10x Micron 9300 PRO 15.36TB Drives 
386G Ram 
1x Micron 5300 PRO 960G for OS 

I virtualized each "sites" cluster using Proxmox. Proxmox is also the OS of the 
OSD/MON VM's. 

The full production setup will have a dedicated 100G network for the private 
network and a dedicated 10G network for the public network. 

If things go well with this testing setup, I hope to have the production 
hardware by June. Possibly moving LV's to RBD's by next year at this time. 

I am ready/eager to help the Ceph project in anyway I can. 






From: "Jason Dillaman"  
To: "adamb"  
Cc: "ceph-users"  
Sent: Thursday, January 21, 2021 2:18:06 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Scalability 

On Thu, Jan 21, 2021 at 2:00 PM Adam Boyhan  wrote: 
> 
> Looks like a script and cron will be a solid work around. 
> 
> Still interested to know if there are any options to make it so rbd-mirror 
> can take more than 1 mirror snap per second. 
> 
> 
> 
> From: "adamb"  
> To: "ceph-users"  
> Sent: Thursday, January 21, 2021 11:18:36 AM 
> Subject: [ceph-users] RBD-Mirror Snapshot Scalability 
> 
> I have noticed that RBD-Mirror snapshot mode can only manage to take 1 
> snapshot per second. For example I have 21 images in a single pool. When the 
> schedule is triggered it takes the mirror snapshot of each image 1 at a time. 
> It doesn't feel or look like a performance issue as the OSD's are Micron 9300 
> PRO NVMe's and each server has 2x Intel Platinum 8268 CPU's. 

The creation of snapshot ids is limited by the MONs quorum process. It 
can issue multiple ids in a single batch, but they all need to be 
queued up. The most recent version of the MGR's RBD mirror snapshot 
scheduler works asynchronously so it can start multiple snapshots 
concurrently. It's much better but it still won't scale to hundreds of 
snapshots per second (*not that your cluster could even keep up 
regardless even if the MONs could). 

> I was hoping that adding more RDB-Mirror instance would help, but that only 
> seems to help with overall throughput. As it sits I have 3 RBD-Mirror 
> instances running on each cluster. 
> 
> We run a 30 minute snapshot schedule to our remote site as it is, based on 
> that I can only squeeze 1800 mirror snaps every 30 minutes. 

Honestly, you might be at the bleeding edge here with an attempt to 
replicate >1,800 images. Getting feedback from deployments like yours 
can help us improve the software since we, realistically, don't have 
the compute resources to easily test at large scale. 

> I was hoping there might be something I am missing with RBD-Mirror as far as 
> scaling goes. 
> 
> Maybe multiple pools would be a solution and have other benefits? 
> 
> 
> 
> 
> ___ 
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 
> ___ 
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 
> 


-- 
Jason 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-21 Thread Adam Boyhan
Sure thing. 

root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-1 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
12192 TestSnapper1 2 TiB Thu Jan 21 14:15:02 2021 user 
12595 
.mirror.non_primary.a04e92df-3d64-4dc4-8ac8-eaba17b45403.34c4a53e-9525-446c-8de6-409ea93c5edd
 2 TiB Thu Jan 21 15:05:02 2021 mirror (non-primary peer_uuids:[] 
6c26557e-d011-47b1-8c99-34cf6e0c7f2f:12801 copied) 


root@Bunkcephtest1:~# rbd mirror image status CephTestPool1/vm-100-disk-1 
vm-100-disk-1: 
global_id: a04e92df-3d64-4dc4-8ac8-eaba17b45403 
state: up+replaying 
description: replaying, 
{"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1611259501,"remote_snapshot_timestamp":1611259501,"replay_state":"idle"}
 
service: admin on Bunkcephmon1 
last_update: 2021-01-21 15:06:24 
peer_sites: 
name: ccs 
state: up+stopped 
description: local image is primary 
last_update: 2021-01-21 15:06:23 


root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-1@TestSnapper1 
CephTestPool1/vm-100-disk-1-CLONE 
root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-1-CLONE 
/dev/nbd0 
root@Bunkcephtest1:~# blkid /dev/nbd0 
root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 
mount: /usr2: wrong fs type, bad option, bad superblock on /dev/nbd0, missing 
codepage or helper program, or other error. 


Primary still looks good. 

root@Ccscephtest1:~# rbd clone CephTestPool1/vm-100-disk-1@TestSnapper1 
CephTestPool1/vm-100-disk-1-CLONE 
root@Ccscephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-1-CLONE 
/dev/nbd0 
root@Ccscephtest1:~# blkid /dev/nbd0 
/dev/nbd0: UUID="830b8e05-d5c1-481d-896d-14e21d17017d" TYPE="ext4" 
root@Ccscephtest1:~# mount /dev/nbd0 /usr2 
root@Ccscephtest1:~# cat /proc/mounts | grep nbd0 
/dev/nbd0 /usr2 ext4 rw,relatime 0 0 






From: "Jason Dillaman"  
To: "adamb"  
Cc: "Eugen Block" , "ceph-users" , "Matt 
Wilder"  
Sent: Thursday, January 21, 2021 3:01:46 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Thu, Jan 21, 2021 at 11:51 AM Adam Boyhan  wrote: 
> 
> I was able to trigger the issue again. 
> 
> - On the primary I created a snap called TestSnapper for disk vm-100-disk-1 
> - Allowed the next RBD-Mirror scheduled snap to complete 
> - At this point the snapshot is showing up on the remote side. 
> 
> root@Bunkcephtest1:~# rbd mirror image status CephTestPool1/vm-100-disk-1 
> vm-100-disk-1: 
> global_id: a04e92df-3d64-4dc4-8ac8-eaba17b45403 
> state: up+replaying 
> description: replaying, 
> {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1611247200,"remote_snapshot_timestamp":1611247200,"replay_state":"idle"}
>  
> service: admin on Bunkcephmon1 
> last_update: 2021-01-21 11:46:24 
> peer_sites: 
> name: ccs 
> state: up+stopped 
> description: local image is primary 
> last_update: 2021-01-21 11:46:28 
> 
> root@Ccscephtest1:/etc/pve/priv# rbd snap ls --all 
> CephTestPool1/vm-100-disk-1 
> SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
> 11532 TestSnapper 2 TiB Thu Jan 21 11:21:25 2021 user 
> 11573 
> .mirror.primary.a04e92df-3d64-4dc4-8ac8-eaba17b45403.9525e4eb-41c0-499c-8879-0c7d9576e253
>  2 TiB Thu Jan 21 11:35:00 2021 mirror (primary 
> peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770]) 
> 
> Seems like the sync is complete, So I then clone it, map it and attempt to 
> mount it. 

Can you run "snap ls --all" on the non-primary cluster? The 
non-primary snapshot will list its status. On my cluster (with a much 
smaller image): 

# 
# CLUSTER 1 
# 
$ rbd --cluster cluster1 create --size 1G mirror/image1 
$ rbd --cluster cluster1 mirror image enable mirror/image1 snapshot 
Mirroring enabled 
$ rbd --cluster cluster1 device map -t nbd mirror/image1 
/dev/nbd0 
$ mkfs.ext4 /dev/nbd0 
mke2fs 1.45.5 (07-Jan-2020) 
Discarding device blocks: done 
Creating filesystem with 262144 4k blocks and 65536 inodes 
Filesystem UUID: 50e0da12-1f99-4d45-b6e6-5f7a7decaeff 
Superblock backups stored on blocks: 
32768, 98304, 163840, 229376 

Allocating group tables: done 
Writing inode tables: done 
Creating journal (8192 blocks): done 
Writing superblocks and filesystem accounting information: done 
$ blkid /dev/nbd0 
/dev/nbd0: UUID="50e0da12-1f99-4d45-b6e6-5f7a7decaeff" 
BLOCK_SIZE="4096" TYPE="ext4" 
$ rbd --cluster cluster1 snap create mirror/image1@fs 
Creating snap: 100% complete...done. 
$ rbd --cluster cluster1 mirror image snapshot mirror/image1 
Snapshot ID: 6 
$ rbd --cluster cluster1 snap ls --all mirror/image1 
SNAPID NAME 
SIZE PROTECTED TIMESTAMP 
NAMESPACE 
5 fs 
1 GiB Thu Jan 21 14:50:24 2021 
user 
6 
.mirror.primary.f9f692b8-2405-416c-9247-5628e303947a.39722e17-f7e6-4050-acf0-3

[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-22 Thread Adam Boyhan
I have been doing a lot of testing. 

The size of the RBD image doesn't have any effect. 

I run into the issue once I actually write data to the rbd. The more data I 
write out, the larger the chance of reproducing the issue. 

I seem to hit the issue of missing the filesystem all together the most, but I 
have also had a few instances where some of the data was simply missing. 

I monitor the mirror status on the remote cluster until the snapshot is 100% 
copied and also make sure all the IO is done. My setup has no issue maxing out 
my 10G interconnect during replication, so its pretty obvious once its done. 

The only way I have found to resolve the issue is to call a mirror resync on 
the secondary array. 

I can then map the rbd on the primary, write more data to it, snap it again, 
and I am back in the same position. 


From: "adamb"  
To: "dillaman"  
Cc: "ceph-users" , "Matt Wilder"  
Sent: Thursday, January 21, 2021 3:11:31 PM 
Subject: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

Sure thing. 

root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-1 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
12192 TestSnapper1 2 TiB Thu Jan 21 14:15:02 2021 user 
12595 
.mirror.non_primary.a04e92df-3d64-4dc4-8ac8-eaba17b45403.34c4a53e-9525-446c-8de6-409ea93c5edd
 2 TiB Thu Jan 21 15:05:02 2021 mirror (non-primary peer_uuids:[] 
6c26557e-d011-47b1-8c99-34cf6e0c7f2f:12801 copied) 


root@Bunkcephtest1:~# rbd mirror image status CephTestPool1/vm-100-disk-1 
vm-100-disk-1: 
global_id: a04e92df-3d64-4dc4-8ac8-eaba17b45403 
state: up+replaying 
description: replaying, 
{"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1611259501,"remote_snapshot_timestamp":1611259501,"replay_state":"idle"}
 
service: admin on Bunkcephmon1 
last_update: 2021-01-21 15:06:24 
peer_sites: 
name: ccs 
state: up+stopped 
description: local image is primary 
last_update: 2021-01-21 15:06:23 


root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-1@TestSnapper1 
CephTestPool1/vm-100-disk-1-CLONE 
root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-1-CLONE 
/dev/nbd0 
root@Bunkcephtest1:~# blkid /dev/nbd0 
root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 
mount: /usr2: wrong fs type, bad option, bad superblock on /dev/nbd0, missing 
codepage or helper program, or other error. 


Primary still looks good. 

root@Ccscephtest1:~# rbd clone CephTestPool1/vm-100-disk-1@TestSnapper1 
CephTestPool1/vm-100-disk-1-CLONE 
root@Ccscephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-1-CLONE 
/dev/nbd0 
root@Ccscephtest1:~# blkid /dev/nbd0 
/dev/nbd0: UUID="830b8e05-d5c1-481d-896d-14e21d17017d" TYPE="ext4" 
root@Ccscephtest1:~# mount /dev/nbd0 /usr2 
root@Ccscephtest1:~# cat /proc/mounts | grep nbd0 
/dev/nbd0 /usr2 ext4 rw,relatime 0 0 






From: "Jason Dillaman"  
To: "adamb"  
Cc: "Eugen Block" , "ceph-users" , "Matt 
Wilder"  
Sent: Thursday, January 21, 2021 3:01:46 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Thu, Jan 21, 2021 at 11:51 AM Adam Boyhan  wrote: 
> 
> I was able to trigger the issue again. 
> 
> - On the primary I created a snap called TestSnapper for disk vm-100-disk-1 
> - Allowed the next RBD-Mirror scheduled snap to complete 
> - At this point the snapshot is showing up on the remote side. 
> 
> root@Bunkcephtest1:~# rbd mirror image status CephTestPool1/vm-100-disk-1 
> vm-100-disk-1: 
> global_id: a04e92df-3d64-4dc4-8ac8-eaba17b45403 
> state: up+replaying 
> description: replaying, 
> {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1611247200,"remote_snapshot_timestamp":1611247200,"replay_state":"idle"}
>  
> service: admin on Bunkcephmon1 
> last_update: 2021-01-21 11:46:24 
> peer_sites: 
> name: ccs 
> state: up+stopped 
> description: local image is primary 
> last_update: 2021-01-21 11:46:28 
> 
> root@Ccscephtest1:/etc/pve/priv# rbd snap ls --all 
> CephTestPool1/vm-100-disk-1 
> SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
> 11532 TestSnapper 2 TiB Thu Jan 21 11:21:25 2021 user 
> 11573 
> .mirror.primary.a04e92df-3d64-4dc4-8ac8-eaba17b45403.9525e4eb-41c0-499c-8879-0c7d9576e253
>  2 TiB Thu Jan 21 11:35:00 2021 mirror (primary 
> peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770]) 
> 
> Seems like the sync is complete, So I then clone it, map it and attempt to 
> mount it. 

Can you run "snap ls --all" on the non-primary cluster? The 
non-primary snapshot will list its status. On my cluster (with a much 
smaller image): 

# 
# CLUSTER 1 
# 
$ rbd --cluster cluster1 create --size 1G mirror/image1 
$ rbd --cluster cluster1 mirror image enable mirror/image1 sna

[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-22 Thread Adam Boyhan
The steps are pretty straight forward. 

- Create rbd image of 500G on the primary 
- Enable rbd-mirror snapshot on the image 
- Map the image on the primary 
- Format the block device with ext4 
- Mount it and write out 200-300G worth of data (I am using rsync with some 
local real data we have) 
- Unmap the image from the primary 
- Create rdb snapshot 
- Create rdb mirror snapshot 
- Wait for copy process to complete 
- Clone the rdb snapshot on secondary 
- Map the image on secondary 
- Try to mount on secondary 

Just as a reference. All of my nodes are the same. 

root@Bunkcephtest1:~# ceph --version 
ceph version 15.2.8 (8b89984e92223ec320fb4c70589c39f384c86985) octopus (stable) 

root@Bunkcephtest1:~# dpkg -l | grep rbd-mirror 
ii rbd-mirror 15.2.8-pve2 amd64 Ceph daemon for mirroring RBD images 

This is pretty straight forward, I don't know what I could be missing here. 



From: "Jason Dillaman"  
To: "adamb"  
Cc: "ceph-users" , "Matt Wilder"  
Sent: Friday, January 22, 2021 2:11:36 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

Any chance you could write a small reproducer test script? I can't 
repeat what you are seeing and we do have test cases that really 
hammer random IO on primary images, create snapshots, rinse-and-repeat 
and they haven't turned up anything yet. 

Thanks! 

On Fri, Jan 22, 2021 at 1:50 PM Adam Boyhan  wrote: 
> 
> I have been doing a lot of testing. 
> 
> The size of the RBD image doesn't have any effect. 
> 
> I run into the issue once I actually write data to the rbd. The more data I 
> write out, the larger the chance of reproducing the issue. 
> 
> I seem to hit the issue of missing the filesystem all together the most, but 
> I have also had a few instances where some of the data was simply missing. 
> 
> I monitor the mirror status on the remote cluster until the snapshot is 100% 
> copied and also make sure all the IO is done. My setup has no issue maxing 
> out my 10G interconnect during replication, so its pretty obvious once its 
> done. 
> 
> The only way I have found to resolve the issue is to call a mirror resync on 
> the secondary array. 
> 
> I can then map the rbd on the primary, write more data to it, snap it again, 
> and I am back in the same position. 
> 
>  
> From: "adamb"  
> To: "dillaman"  
> Cc: "ceph-users" , "Matt Wilder"  
> Sent: Thursday, January 21, 2021 3:11:31 PM 
> Subject: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> Sure thing. 
> 
> root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-1 
> SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
> 12192 TestSnapper1 2 TiB Thu Jan 21 14:15:02 2021 user 
> 12595 
> .mirror.non_primary.a04e92df-3d64-4dc4-8ac8-eaba17b45403.34c4a53e-9525-446c-8de6-409ea93c5edd
>  2 TiB Thu Jan 21 15:05:02 2021 mirror (non-primary peer_uuids:[] 
> 6c26557e-d011-47b1-8c99-34cf6e0c7f2f:12801 copied) 
> 
> 
> root@Bunkcephtest1:~# rbd mirror image status CephTestPool1/vm-100-disk-1 
> vm-100-disk-1: 
> global_id: a04e92df-3d64-4dc4-8ac8-eaba17b45403 
> state: up+replaying 
> description: replaying, 
> {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1611259501,"remote_snapshot_timestamp":1611259501,"replay_state":"idle"}
>  
> service: admin on Bunkcephmon1 
> last_update: 2021-01-21 15:06:24 
> peer_sites: 
> name: ccs 
> state: up+stopped 
> description: local image is primary 
> last_update: 2021-01-21 15:06:23 
> 
> 
> root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-1@TestSnapper1 
> CephTestPool1/vm-100-disk-1-CLONE 
> root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-1-CLONE 
> /dev/nbd0 
> root@Bunkcephtest1:~# blkid /dev/nbd0 
> root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 
> mount: /usr2: wrong fs type, bad option, bad superblock on /dev/nbd0, missing 
> codepage or helper program, or other error. 
> 
> 
> Primary still looks good. 
> 
> root@Ccscephtest1:~# rbd clone CephTestPool1/vm-100-disk-1@TestSnapper1 
> CephTestPool1/vm-100-disk-1-CLONE 
> root@Ccscephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-1-CLONE 
> /dev/nbd0 
> root@Ccscephtest1:~# blkid /dev/nbd0 
> /dev/nbd0: UUID="830b8e05-d5c1-481d-896d-14e21d17017d" TYPE="ext4" 
> root@Ccscephtest1:~# mount /dev/nbd0 /usr2 
> root@Ccscephtest1:~# cat /proc/mounts | grep nbd0 
> /dev/nbd0 /usr2 ext4 rw,relatime 0 0 
> 
> 
> 
> 
> 
> 
> From: "Jason Dillaman"  
> To: "adamb"  
> Cc: "Eugen Block" , "ceph-users" , "Matt 
> Wilder"  
> 

[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-22 Thread Adam Boyhan
I will have to do some looking into how that is done on Proxmox, but most 
definitely. 



From: "Jason Dillaman"  
To: "adamb"  
Cc: "ceph-users" , "Matt Wilder"  
Sent: Friday, January 22, 2021 3:02:23 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

Any chance you can attempt to repeat the process on the latest master 
or pacific branch clients (no need to upgrade the MONs/OSDs)? 

On Fri, Jan 22, 2021 at 2:32 PM Adam Boyhan  wrote: 
> 
> The steps are pretty straight forward. 
> 
> - Create rbd image of 500G on the primary 
> - Enable rbd-mirror snapshot on the image 
> - Map the image on the primary 
> - Format the block device with ext4 
> - Mount it and write out 200-300G worth of data (I am using rsync with some 
> local real data we have) 
> - Unmap the image from the primary 
> - Create rdb snapshot 
> - Create rdb mirror snapshot 
> - Wait for copy process to complete 
> - Clone the rdb snapshot on secondary 
> - Map the image on secondary 
> - Try to mount on secondary 
> 
> Just as a reference. All of my nodes are the same. 
> 
> root@Bunkcephtest1:~# ceph --version 
> ceph version 15.2.8 (8b89984e92223ec320fb4c70589c39f384c86985) octopus 
> (stable) 
> 
> root@Bunkcephtest1:~# dpkg -l | grep rbd-mirror 
> ii rbd-mirror 15.2.8-pve2 amd64 Ceph daemon for mirroring RBD images 
> 
> This is pretty straight forward, I don't know what I could be missing here. 
> 
> 
>  
> From: "Jason Dillaman"  
> To: "adamb"  
> Cc: "ceph-users" , "Matt Wilder"  
> Sent: Friday, January 22, 2021 2:11:36 PM 
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> Any chance you could write a small reproducer test script? I can't 
> repeat what you are seeing and we do have test cases that really 
> hammer random IO on primary images, create snapshots, rinse-and-repeat 
> and they haven't turned up anything yet. 
> 
> Thanks! 
> 
> On Fri, Jan 22, 2021 at 1:50 PM Adam Boyhan  wrote: 
> > 
> > I have been doing a lot of testing. 
> > 
> > The size of the RBD image doesn't have any effect. 
> > 
> > I run into the issue once I actually write data to the rbd. The more data I 
> > write out, the larger the chance of reproducing the issue. 
> > 
> > I seem to hit the issue of missing the filesystem all together the most, 
> > but I have also had a few instances where some of the data was simply 
> > missing. 
> > 
> > I monitor the mirror status on the remote cluster until the snapshot is 
> > 100% copied and also make sure all the IO is done. My setup has no issue 
> > maxing out my 10G interconnect during replication, so its pretty obvious 
> > once its done. 
> > 
> > The only way I have found to resolve the issue is to call a mirror resync 
> > on the secondary array. 
> > 
> > I can then map the rbd on the primary, write more data to it, snap it 
> > again, and I am back in the same position. 
> > 
> >  
> > From: "adamb"  
> > To: "dillaman"  
> > Cc: "ceph-users" , "Matt Wilder" 
> >  
> > Sent: Thursday, January 21, 2021 3:11:31 PM 
> > Subject: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> > 
> > Sure thing. 
> > 
> > root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-1 
> > SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
> > 12192 TestSnapper1 2 TiB Thu Jan 21 14:15:02 2021 user 
> > 12595 
> > .mirror.non_primary.a04e92df-3d64-4dc4-8ac8-eaba17b45403.34c4a53e-9525-446c-8de6-409ea93c5edd
> >  2 TiB Thu Jan 21 15:05:02 2021 mirror (non-primary peer_uuids:[] 
> > 6c26557e-d011-47b1-8c99-34cf6e0c7f2f:12801 copied) 
> > 
> > 
> > root@Bunkcephtest1:~# rbd mirror image status CephTestPool1/vm-100-disk-1 
> > vm-100-disk-1: 
> > global_id: a04e92df-3d64-4dc4-8ac8-eaba17b45403 
> > state: up+replaying 
> > description: replaying, 
> > {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1611259501,"remote_snapshot_timestamp":1611259501,"replay_state":"idle"}
> >  
> > service: admin on Bunkcephmon1 
> > last_update: 2021-01-21 15:06:24 
> > peer_sites: 
> > name: ccs 
> > state: up+stopped 
> > description: local image is primary 
> > last_update: 2021-01-21 15:06:23 
> > 
> > 
> > root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-1@TestSnapper1 
> > CephTestPool1/v

[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-26 Thread Adam Boyhan
Did some testing with clients running 16.1. I setup two different clients, each 
one dedicated to its perspective cluster. Running Proxmox, I compiled the 
latest Pacific 16.1 build. 

root@Ccspacificclient:/cephbuild/ceph/build/bin# ./ceph -v 
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 
ceph version 16.1.0-8-g5f17c37f78 (5f17c37f78a331b7a4bf793890f9d324c64183e5) 
pacific (rc) 

root@Bunkpacificclient:/cephbuild/ceph/build/bin# ./ceph -v 
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 
ceph version 16.1.0-8-g5f17c37f78 (5f17c37f78a331b7a4bf793890f9d324c64183e5) 
pacific (rc) 

Unfortunately, I am hitting the same exact issues using a pacific client. 

Would this confirm that its something specific in 15.2.8 on the osd/mon nodes? 






From: "Jason Dillaman"  
To: "adamb"  
Cc: "ceph-users" , "Matt Wilder"  
Sent: Friday, January 22, 2021 3:44:26 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Fri, Jan 22, 2021 at 3:29 PM Adam Boyhan  wrote: 
> 
> I will have to do some looking into how that is done on Proxmox, but most 
> definitely. 

Thanks, appreciate it. 

>  
> From: "Jason Dillaman"  
> To: "adamb"  
> Cc: "ceph-users" , "Matt Wilder"  
> Sent: Friday, January 22, 2021 3:02:23 PM 
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> Any chance you can attempt to repeat the process on the latest master 
> or pacific branch clients (no need to upgrade the MONs/OSDs)? 
> 
> On Fri, Jan 22, 2021 at 2:32 PM Adam Boyhan  wrote: 
> > 
> > The steps are pretty straight forward. 
> > 
> > - Create rbd image of 500G on the primary 
> > - Enable rbd-mirror snapshot on the image 
> > - Map the image on the primary 
> > - Format the block device with ext4 
> > - Mount it and write out 200-300G worth of data (I am using rsync with some 
> > local real data we have) 
> > - Unmap the image from the primary 
> > - Create rdb snapshot 
> > - Create rdb mirror snapshot 
> > - Wait for copy process to complete 
> > - Clone the rdb snapshot on secondary 
> > - Map the image on secondary 
> > - Try to mount on secondary 
> > 
> > Just as a reference. All of my nodes are the same. 
> > 
> > root@Bunkcephtest1:~# ceph --version 
> > ceph version 15.2.8 (8b89984e92223ec320fb4c70589c39f384c86985) octopus 
> > (stable) 
> > 
> > root@Bunkcephtest1:~# dpkg -l | grep rbd-mirror 
> > ii rbd-mirror 15.2.8-pve2 amd64 Ceph daemon for mirroring RBD images 
> > 
> > This is pretty straight forward, I don't know what I could be missing here. 
> > 
> > 
> >  
> > From: "Jason Dillaman"  
> > To: "adamb"  
> > Cc: "ceph-users" , "Matt Wilder" 
> >  
> > Sent: Friday, January 22, 2021 2:11:36 PM 
> > Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> > 
> > Any chance you could write a small reproducer test script? I can't 
> > repeat what you are seeing and we do have test cases that really 
> > hammer random IO on primary images, create snapshots, rinse-and-repeat 
> > and they haven't turned up anything yet. 
> > 
> > Thanks! 
> > 
> > On Fri, Jan 22, 2021 at 1:50 PM Adam Boyhan  wrote: 
> > > 
> > > I have been doing a lot of testing. 
> > > 
> > > The size of the RBD image doesn't have any effect. 
> > > 
> > > I run into the issue once I actually write data to the rbd. The more data 
> > > I write out, the larger the chance of reproducing the issue. 
> > > 
> > > I seem to hit the issue of missing the filesystem all together the most, 
> > > but I have also had a few instances where some of the data was simply 
> > > missing. 
> > > 
> > > I monitor the mirror status on the remote cluster until the snapshot is 
> > > 100% copied and also make sure all the IO is done. My setup has no issue 
> > > maxing out my 10G interconnect during replication, so its pretty obvious 
> > > once its done. 
> > > 
> > > The only way I have found to resolve the issue is to call a mirror resync 
> > > on the secondary array. 
> > > 
> > > I can then map the rbd on the primary, write more data to it, snap it 
> > > again, and I am back in the same position. 
> > > 
> > >  
> > > From: "adamb"  
> > > To: "dillaman"  
> > > Cc: "

[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-27 Thread Adam Boyhan
Doing some more testing. 

I can demote the rbd image on the primary, promote on the secondary and the 
image looks great. I can map it, mount it, and it looks just like it should. 

However, the rbd snapshots are still unusable on the secondary even when 
promoted. I went as far as taking a 2nd snapshot on the rbd before 
demoting/promoting and that 2nd snapshot still won't work either. Both 
snapshots look and work great on the primary. 

If I request a resync from the secondary the rbd snapshots start working just 
like the primary, but only if I request a resync. 

I get the same exact results whether I am using a 15.2.8 or 16.1 client. 




From: "adamb"  
To: "dillaman"  
Cc: "ceph-users" , "Matt Wilder"  
Sent: Tuesday, January 26, 2021 1:51:13 PM 
Subject: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

Did some testing with clients running 16.1. I setup two different clients, each 
one dedicated to its perspective cluster. Running Proxmox, I compiled the 
latest Pacific 16.1 build. 

root@Ccspacificclient:/cephbuild/ceph/build/bin# ./ceph -v 
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 
ceph version 16.1.0-8-g5f17c37f78 (5f17c37f78a331b7a4bf793890f9d324c64183e5) 
pacific (rc) 

root@Bunkpacificclient:/cephbuild/ceph/build/bin# ./ceph -v 
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 
ceph version 16.1.0-8-g5f17c37f78 (5f17c37f78a331b7a4bf793890f9d324c64183e5) 
pacific (rc) 

Unfortunately, I am hitting the same exact issues using a pacific client. 

Would this confirm that its something specific in 15.2.8 on the osd/mon nodes? 






From: "Jason Dillaman"  
To: "adamb"  
Cc: "ceph-users" , "Matt Wilder"  
Sent: Friday, January 22, 2021 3:44:26 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Fri, Jan 22, 2021 at 3:29 PM Adam Boyhan  wrote: 
> 
> I will have to do some looking into how that is done on Proxmox, but most 
> definitely. 

Thanks, appreciate it. 

>  
> From: "Jason Dillaman"  
> To: "adamb"  
> Cc: "ceph-users" , "Matt Wilder"  
> Sent: Friday, January 22, 2021 3:02:23 PM 
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> Any chance you can attempt to repeat the process on the latest master 
> or pacific branch clients (no need to upgrade the MONs/OSDs)? 
> 
> On Fri, Jan 22, 2021 at 2:32 PM Adam Boyhan  wrote: 
> > 
> > The steps are pretty straight forward. 
> > 
> > - Create rbd image of 500G on the primary 
> > - Enable rbd-mirror snapshot on the image 
> > - Map the image on the primary 
> > - Format the block device with ext4 
> > - Mount it and write out 200-300G worth of data (I am using rsync with some 
> > local real data we have) 
> > - Unmap the image from the primary 
> > - Create rdb snapshot 
> > - Create rdb mirror snapshot 
> > - Wait for copy process to complete 
> > - Clone the rdb snapshot on secondary 
> > - Map the image on secondary 
> > - Try to mount on secondary 
> > 
> > Just as a reference. All of my nodes are the same. 
> > 
> > root@Bunkcephtest1:~# ceph --version 
> > ceph version 15.2.8 (8b89984e92223ec320fb4c70589c39f384c86985) octopus 
> > (stable) 
> > 
> > root@Bunkcephtest1:~# dpkg -l | grep rbd-mirror 
> > ii rbd-mirror 15.2.8-pve2 amd64 Ceph daemon for mirroring RBD images 
> > 
> > This is pretty straight forward, I don't know what I could be missing here. 
> > 
> > 
> >  
> > From: "Jason Dillaman"  
> > To: "adamb"  
> > Cc: "ceph-users" , "Matt Wilder" 
> >  
> > Sent: Friday, January 22, 2021 2:11:36 PM 
> > Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> > 
> > Any chance you could write a small reproducer test script? I can't 
> > repeat what you are seeing and we do have test cases that really 
> > hammer random IO on primary images, create snapshots, rinse-and-repeat 
> > and they haven't turned up anything yet. 
> > 
> > Thanks! 
> > 
> > On Fri, Jan 22, 2021 at 1:50 PM Adam Boyhan  wrote: 
> > > 
> > > I have been doing a lot of testing. 
> > > 
> > > The size of the RBD image doesn't have any effect. 
> > > 
> > > I run into the issue once I actually write data to the rbd. The more data 
> > > I write out, the larger the chance of reproducing the issue. 
> > > 
> > > I seem to hit the issue of missing the filesystem all together the most, 
> > > but I hav

[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-28 Thread Adam Boyhan
Went through the process like 4-5 times now and its looking good. I am going to 
continue beating on it to make sure. I will report back tomorrow. 

Nice catch! 




From: "Jason Dillaman"  
To: "adamb"  
Cc: "ceph-users" , "Matt Wilder"  
Sent: Thursday, January 28, 2021 12:53:50 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Thu, Jan 28, 2021 at 10:31 AM Jason Dillaman  wrote: 
> 
> On Wed, Jan 27, 2021 at 7:27 AM Adam Boyhan  wrote: 
> > 
> > Doing some more testing. 
> > 
> > I can demote the rbd image on the primary, promote on the secondary and the 
> > image looks great. I can map it, mount it, and it looks just like it 
> > should. 
> > 
> > However, the rbd snapshots are still unusable on the secondary even when 
> > promoted. I went as far as taking a 2nd snapshot on the rbd before 
> > demoting/promoting and that 2nd snapshot still won't work either. Both 
> > snapshots look and work great on the primary. 
> > 
> > If I request a resync from the secondary the rbd snapshots start working 
> > just like the primary, but only if I request a resync. 
> > 
> > I get the same exact results whether I am using a 15.2.8 or 16.1 client. 

Any chance I can get you to retry your test w/ the object-map / 
fast-diff disabled on the primary image (the features will be 
propagated to the non-primary image on create / next mirror snapshot)? 
I was able to recreate the issue on when the object-map is enabled and 
I'll hopefully get a fix for that issue up for review today. I just 
want to verify that it is the same issue that you are hitting. 

> Thanks. I've opened a tracker ticket 
> (https://tracker.ceph.com/issues/49037) for this. 
> 
> > 
> > 
> > 
> >  
> > From: "adamb"  
> > To: "dillaman"  
> > Cc: "ceph-users" , "Matt Wilder" 
> >  
> > Sent: Tuesday, January 26, 2021 1:51:13 PM 
> > Subject: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> > 
> > Did some testing with clients running 16.1. I setup two different clients, 
> > each one dedicated to its perspective cluster. Running Proxmox, I compiled 
> > the latest Pacific 16.1 build. 
> > 
> > root@Ccspacificclient:/cephbuild/ceph/build/bin# ./ceph -v 
> > *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 
> > ceph version 16.1.0-8-g5f17c37f78 
> > (5f17c37f78a331b7a4bf793890f9d324c64183e5) pacific (rc) 
> > 
> > root@Bunkpacificclient:/cephbuild/ceph/build/bin# ./ceph -v 
> > *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 
> > ceph version 16.1.0-8-g5f17c37f78 
> > (5f17c37f78a331b7a4bf793890f9d324c64183e5) pacific (rc) 
> > 
> > Unfortunately, I am hitting the same exact issues using a pacific client. 
> > 
> > Would this confirm that its something specific in 15.2.8 on the osd/mon 
> > nodes? 
> > 
> > 
> > 
> > 
> > 
> > 
> > From: "Jason Dillaman"  
> > To: "adamb"  
> > Cc: "ceph-users" , "Matt Wilder" 
> >  
> > Sent: Friday, January 22, 2021 3:44:26 PM 
> > Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> > 
> > On Fri, Jan 22, 2021 at 3:29 PM Adam Boyhan  wrote: 
> > > 
> > > I will have to do some looking into how that is done on Proxmox, but most 
> > > definitely. 
> > 
> > Thanks, appreciate it. 
> > 
> > >  
> > > From: "Jason Dillaman"  
> > > To: "adamb"  
> > > Cc: "ceph-users" , "Matt Wilder" 
> > >  
> > > Sent: Friday, January 22, 2021 3:02:23 PM 
> > > Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> > > 
> > > Any chance you can attempt to repeat the process on the latest master 
> > > or pacific branch clients (no need to upgrade the MONs/OSDs)? 
> > > 
> > > On Fri, Jan 22, 2021 at 2:32 PM Adam Boyhan  wrote: 
> > > > 
> > > > The steps are pretty straight forward. 
> > > > 
> > > > - Create rbd image of 500G on the primary 
> > > > - Enable rbd-mirror snapshot on the image 
> > > > - Map the image on the primary 
> > > > - Format the block device with ext4 
> > > > - Mount it and write out 200-300G worth of data (I am using rsync with 
> > > > some local real data we have) 
> > > > - Unmap the image from the primary 
> > > > 

[ceph-users] Unable to enable RBD-Mirror Snapshot on image when VM is using RBD

2021-01-29 Thread Adam Boyhan
This is a odd one. I don't hit it all the time so I don't think its expected 
behavior. 

Sometimes I have no issues enabling rbd-mirror snapshot mode on a rbd when its 
in use by a KVM VM. Other times I hit the following error, the only way I can 
get around it is to power down the KVM VM. 

root@Ccscephtest1:~# rbd mirror image enable CephTestPool1/vm-101-disk-0 
snapshot 
2021-01-29T09:29:07.875-0500 7f1e99ffb700 -1 
librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f1e7c012440 
handle_create_snapshot: failed to create mirror snapshot: (22) Invalid argument 
2021-01-29T09:29:07.875-0500 7f1e99ffb700 -1 librbd::mirror::EnableRequest: 
0x5597667fd200 handle_create_primary_snapshot: failed to create initial primary 
snapshot: (22) Invalid argument 
2021-01-29T09:29:07.875-0500 7f1ea559f3c0 -1 librbd::api::Mirror: image_enable: 
cannot enable mirroring: (22) Invalid argument 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to enable RBD-Mirror Snapshot on image when VM is using RBD

2021-01-29 Thread Adam Boyhan
That makes sense. Appreciate it. 



From: "Jason Dillaman"  
To: "adamb"  
Cc: "ceph-users"  
Sent: Friday, January 29, 2021 9:39:28 AM 
Subject: Re: [ceph-users] Unable to enable RBD-Mirror Snapshot on image when VM 
is using RBD 

On Fri, Jan 29, 2021 at 9:34 AM Adam Boyhan  wrote: 
> 
> This is a odd one. I don't hit it all the time so I don't think its expected 
> behavior. 
> 
> Sometimes I have no issues enabling rbd-mirror snapshot mode on a rbd when 
> its in use by a KVM VM. Other times I hit the following error, the only way I 
> can get around it is to power down the KVM VM. 
> 
> root@Ccscephtest1:~# rbd mirror image enable CephTestPool1/vm-101-disk-0 
> snapshot 
> 2021-01-29T09:29:07.875-0500 7f1e99ffb700 -1 
> librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f1e7c012440 
> handle_create_snapshot: failed to create mirror snapshot: (22) Invalid 
> argument 
> 2021-01-29T09:29:07.875-0500 7f1e99ffb700 -1 librbd::mirror::EnableRequest: 
> 0x5597667fd200 handle_create_primary_snapshot: failed to create initial 
> primary snapshot: (22) Invalid argument 
> 2021-01-29T09:29:07.875-0500 7f1ea559f3c0 -1 librbd::api::Mirror: 
> image_enable: cannot enable mirroring: (22) Invalid argument 

I suspect that you have the exclusive-lock feature enabled and the 
QEMU hypervisor host has a pre-octopus version of librbd. 

> ___ 
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 
> 


-- 
Jason 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to enable RBD-Mirror Snapshot on image when VM is using RBD

2021-01-29 Thread Adam Boyhan
Spot on, once I upgraded the client to 15.2.8 I was able to enable rbd-mirror 
snapshots and create them while the VM was running. 

However, I have noticed that I am also able to break replication when the rbd 
is being used by a KVM VM. 

While writing data in the VM, I took a rbd snapshot, then kicked off a mirror 
snapshot. I took the rbd snapshot and mirror snapshot from one the osd nodes. 

Primary 
vm-101-disk-0: 
global_id: 68fe424e-3683-48af-ba39-2b0b4726d918 
state: up+stopped 
description: local image is primary 
service: admin on Ccscephmon1 
last_update: 2021-01-29 10:44:40 
peer_sites: 
name: bunker 
state: up+error 
description: failed to copy snapshots from remote to local image 
last_update: 2021-01-29 10:44:50 

root@Ccscephtest1:~# rbd snap ls --all CephTestPool1/vm-101-disk-0 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
1581 
.mirror.primary.68fe424e-3683-48af-ba39-2b0b4726d918.3f9da1d2-7f23-4a75-865e-3af062be93be
 2 TiB Fri Jan 29 10:28:53 2021 mirror (primary 
peer_uuids:[35c97c1f-a897-4635-abf6-dbd4085c89bd]) 
1584 Test 2 TiB Fri Jan 29 10:37:37 2021 user 
1585 
.mirror.primary.68fe424e-3683-48af-ba39-2b0b4726d918.24744afa-2d9f-4958-8775-c23b0b7f383c
 2 TiB Fri Jan 29 10:37:47 2021 mirror (primary 
peer_uuids:[35c97c1f-a897-4635-abf6-dbd4085c89bd]) 

Secondary 
vm-101-disk-0: 
global_id: 68fe424e-3683-48af-ba39-2b0b4726d918 
state: up+error 
description: failed to copy snapshots from remote to local image 
service: admin on Bunkcephtest1 
last_update: 2021-01-29 10:47:12 
peer_sites: 
name: ccs 
state: up+stopped 
description: local image is primary 
last_update: 2021-01-29 10:47:10 

root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-101-disk-0 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
1552 Test 2 TiB Fri Jan 29 10:28:53 2021 user 
1553 
.mirror.non_primary.68fe424e-3683-48af-ba39-2b0b4726d918.0cd471db-7ed7-4700-8da0-8a82bc2d23a5
 2 TiB Fri Jan 29 10:28:54 2021 mirror (non-primary peer_uuids:[] 
adcedc94-b1a1-4508-8402-db4ec0dc3085:1581 copied) 

I tried restarting the rbd-mirror services but no luck. 

I ended up issuing a resync and that got things moving until I repeated the 
steps again. 

I have the option of using KRBD but not sure if that will help in this 
situation. 




From: "Jason Dillaman"  
To: "adamb"  
Cc: "ceph-users"  
Sent: Friday, January 29, 2021 9:39:28 AM 
Subject: Re: [ceph-users] Unable to enable RBD-Mirror Snapshot on image when VM 
is using RBD 

On Fri, Jan 29, 2021 at 9:34 AM Adam Boyhan  wrote: 
> 
> This is a odd one. I don't hit it all the time so I don't think its expected 
> behavior. 
> 
> Sometimes I have no issues enabling rbd-mirror snapshot mode on a rbd when 
> its in use by a KVM VM. Other times I hit the following error, the only way I 
> can get around it is to power down the KVM VM. 
> 
> root@Ccscephtest1:~# rbd mirror image enable CephTestPool1/vm-101-disk-0 
> snapshot 
> 2021-01-29T09:29:07.875-0500 7f1e99ffb700 -1 
> librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f1e7c012440 
> handle_create_snapshot: failed to create mirror snapshot: (22) Invalid 
> argument 
> 2021-01-29T09:29:07.875-0500 7f1e99ffb700 -1 librbd::mirror::EnableRequest: 
> 0x5597667fd200 handle_create_primary_snapshot: failed to create initial 
> primary snapshot: (22) Invalid argument 
> 2021-01-29T09:29:07.875-0500 7f1ea559f3c0 -1 librbd::api::Mirror: 
> image_enable: cannot enable mirroring: (22) Invalid argument 

I suspect that you have the exclusive-lock feature enabled and the 
QEMU hypervisor host has a pre-octopus version of librbd. 

> ___ 
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 
> 


-- 
Jason 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-29 Thread Adam Boyhan
I have been hammering on my setup non stop with only layering, exclusive-lock 
and deep-flatten enabled. Still can't reproduce the issue. I think this is the 
ticket. 


From: "adamb"  
To: "dillaman"  
Cc: "ceph-users" , "Matt Wilder"  
Sent: Thursday, January 28, 2021 3:37:15 PM 
Subject: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

Went through the process like 4-5 times now and its looking good. I am going to 
continue beating on it to make sure. I will report back tomorrow. 

Nice catch! 




From: "Jason Dillaman"  
To: "adamb"  
Cc: "ceph-users" , "Matt Wilder"  
Sent: Thursday, January 28, 2021 12:53:50 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Thu, Jan 28, 2021 at 10:31 AM Jason Dillaman  wrote: 
> 
> On Wed, Jan 27, 2021 at 7:27 AM Adam Boyhan  wrote: 
> > 
> > Doing some more testing. 
> > 
> > I can demote the rbd image on the primary, promote on the secondary and the 
> > image looks great. I can map it, mount it, and it looks just like it 
> > should. 
> > 
> > However, the rbd snapshots are still unusable on the secondary even when 
> > promoted. I went as far as taking a 2nd snapshot on the rbd before 
> > demoting/promoting and that 2nd snapshot still won't work either. Both 
> > snapshots look and work great on the primary. 
> > 
> > If I request a resync from the secondary the rbd snapshots start working 
> > just like the primary, but only if I request a resync. 
> > 
> > I get the same exact results whether I am using a 15.2.8 or 16.1 client. 

Any chance I can get you to retry your test w/ the object-map / 
fast-diff disabled on the primary image (the features will be 
propagated to the non-primary image on create / next mirror snapshot)? 
I was able to recreate the issue on when the object-map is enabled and 
I'll hopefully get a fix for that issue up for review today. I just 
want to verify that it is the same issue that you are hitting. 

> Thanks. I've opened a tracker ticket 
> (https://tracker.ceph.com/issues/49037) for this. 
> 
> > 
> > 
> > 
> >  
> > From: "adamb"  
> > To: "dillaman"  
> > Cc: "ceph-users" , "Matt Wilder" 
> >  
> > Sent: Tuesday, January 26, 2021 1:51:13 PM 
> > Subject: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> > 
> > Did some testing with clients running 16.1. I setup two different clients, 
> > each one dedicated to its perspective cluster. Running Proxmox, I compiled 
> > the latest Pacific 16.1 build. 
> > 
> > root@Ccspacificclient:/cephbuild/ceph/build/bin# ./ceph -v 
> > *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 
> > ceph version 16.1.0-8-g5f17c37f78 
> > (5f17c37f78a331b7a4bf793890f9d324c64183e5) pacific (rc) 
> > 
> > root@Bunkpacificclient:/cephbuild/ceph/build/bin# ./ceph -v 
> > *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 
> > ceph version 16.1.0-8-g5f17c37f78 
> > (5f17c37f78a331b7a4bf793890f9d324c64183e5) pacific (rc) 
> > 
> > Unfortunately, I am hitting the same exact issues using a pacific client. 
> > 
> > Would this confirm that its something specific in 15.2.8 on the osd/mon 
> > nodes? 
> > 
> > 
> > 
> > 
> > 
> > 
> > From: "Jason Dillaman"  
> > To: "adamb"  
> > Cc: "ceph-users" , "Matt Wilder" 
> >  
> > Sent: Friday, January 22, 2021 3:44:26 PM 
> > Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> > 
> > On Fri, Jan 22, 2021 at 3:29 PM Adam Boyhan  wrote: 
> > > 
> > > I will have to do some looking into how that is done on Proxmox, but most 
> > > definitely. 
> > 
> > Thanks, appreciate it. 
> > 
> > >  
> > > From: "Jason Dillaman"  
> > > To: "adamb"  
> > > Cc: "ceph-users" , "Matt Wilder" 
> > >  
> > > Sent: Friday, January 22, 2021 3:02:23 PM 
> > > Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> > > 
> > > Any chance you can attempt to repeat the process on the latest master 
> > > or pacific branch clients (no need to upgrade the MONs/OSDs)? 
> > > 
> > > On Fri, Jan 22, 2021 at 2:32 PM Adam Boyhan  wrote: 
> > > > 
> > > > The steps are pretty straight forward. 
> > > > 
> > > > - Create rbd image of 500G on t

[ceph-users] Re: Unable to enable RBD-Mirror Snapshot on image when VM is using RBD

2021-02-01 Thread Adam Boyhan
Doing some testing it seems this issue only happens about 10% of the time. 
Haven't really correlated anything that could be the cause. 

A resync gets things going again. 

Anyway to further determine what this means? 

description: failed to copy snapshots from remote to local image 




From: "adamb"  
To: "dillaman"  
Cc: "ceph-users"  
Sent: Friday, January 29, 2021 11:00:01 AM 
Subject: [ceph-users] Re: Unable to enable RBD-Mirror Snapshot on image when VM 
is using RBD 

Spot on, once I upgraded the client to 15.2.8 I was able to enable rbd-mirror 
snapshots and create them while the VM was running. 

However, I have noticed that I am also able to break replication when the rbd 
is being used by a KVM VM. 

While writing data in the VM, I took a rbd snapshot, then kicked off a mirror 
snapshot. I took the rbd snapshot and mirror snapshot from one the osd nodes. 

Primary 
vm-101-disk-0: 
global_id: 68fe424e-3683-48af-ba39-2b0b4726d918 
state: up+stopped 
description: local image is primary 
service: admin on Ccscephmon1 
last_update: 2021-01-29 10:44:40 
peer_sites: 
name: bunker 
state: up+error 
description: failed to copy snapshots from remote to local image 
last_update: 2021-01-29 10:44:50 

root@Ccscephtest1:~# rbd snap ls --all CephTestPool1/vm-101-disk-0 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
1581 
.mirror.primary.68fe424e-3683-48af-ba39-2b0b4726d918.3f9da1d2-7f23-4a75-865e-3af062be93be
 2 TiB Fri Jan 29 10:28:53 2021 mirror (primary 
peer_uuids:[35c97c1f-a897-4635-abf6-dbd4085c89bd]) 
1584 Test 2 TiB Fri Jan 29 10:37:37 2021 user 
1585 
.mirror.primary.68fe424e-3683-48af-ba39-2b0b4726d918.24744afa-2d9f-4958-8775-c23b0b7f383c
 2 TiB Fri Jan 29 10:37:47 2021 mirror (primary 
peer_uuids:[35c97c1f-a897-4635-abf6-dbd4085c89bd]) 

Secondary 
vm-101-disk-0: 
global_id: 68fe424e-3683-48af-ba39-2b0b4726d918 
state: up+error 
description: failed to copy snapshots from remote to local image 
service: admin on Bunkcephtest1 
last_update: 2021-01-29 10:47:12 
peer_sites: 
name: ccs 
state: up+stopped 
description: local image is primary 
last_update: 2021-01-29 10:47:10 

root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-101-disk-0 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
1552 Test 2 TiB Fri Jan 29 10:28:53 2021 user 
1553 
.mirror.non_primary.68fe424e-3683-48af-ba39-2b0b4726d918.0cd471db-7ed7-4700-8da0-8a82bc2d23a5
 2 TiB Fri Jan 29 10:28:54 2021 mirror (non-primary peer_uuids:[] 
adcedc94-b1a1-4508-8402-db4ec0dc3085:1581 copied) 

I tried restarting the rbd-mirror services but no luck. 

I ended up issuing a resync and that got things moving until I repeated the 
steps again. 

I have the option of using KRBD but not sure if that will help in this 
situation. 




From: "Jason Dillaman"  
To: "adamb"  
Cc: "ceph-users"  
Sent: Friday, January 29, 2021 9:39:28 AM 
Subject: Re: [ceph-users] Unable to enable RBD-Mirror Snapshot on image when VM 
is using RBD 

On Fri, Jan 29, 2021 at 9:34 AM Adam Boyhan  wrote: 
> 
> This is a odd one. I don't hit it all the time so I don't think its expected 
> behavior. 
> 
> Sometimes I have no issues enabling rbd-mirror snapshot mode on a rbd when 
> its in use by a KVM VM. Other times I hit the following error, the only way I 
> can get around it is to power down the KVM VM. 
> 
> root@Ccscephtest1:~# rbd mirror image enable CephTestPool1/vm-101-disk-0 
> snapshot 
> 2021-01-29T09:29:07.875-0500 7f1e99ffb700 -1 
> librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f1e7c012440 
> handle_create_snapshot: failed to create mirror snapshot: (22) Invalid 
> argument 
> 2021-01-29T09:29:07.875-0500 7f1e99ffb700 -1 librbd::mirror::EnableRequest: 
> 0x5597667fd200 handle_create_primary_snapshot: failed to create initial 
> primary snapshot: (22) Invalid argument 
> 2021-01-29T09:29:07.875-0500 7f1ea559f3c0 -1 librbd::api::Mirror: 
> image_enable: cannot enable mirroring: (22) Invalid argument 

I suspect that you have the exclusive-lock feature enabled and the 
QEMU hypervisor host has a pre-octopus version of librbd. 

> ___ 
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 
> 


-- 
Jason 
___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-03 Thread Adam Boyhan
Isn't this somewhat reliant on the OSD type? 

Redhat/Micron/Samsung/Supermicro have all put out white papers backing the idea 
of 2 copies on NVMe's as safe for production. 


From: "Magnus HAGDORN"  
To: pse...@avalon.org.ua 
Cc: "ceph-users"  
Sent: Wednesday, February 3, 2021 4:43:08 AM 
Subject: [ceph-users] Re: Worst thing that can happen if I have size= 2 

On Wed, 2021-02-03 at 09:39 +, Max Krasilnikov wrote: 
> > if a OSD becomes unavailble (broken disk, rebooting server) then 
> > all 
> > I/O to the PGs stored on that OSD will block until replication 
> > level of 
> > 2 is reached again. So, for a highly available cluster you need a 
> > replication level of 3 
> 
> 
> AFAIK, with min_size 1 it is possible to write even to only active 
> OSD serving 
> 
yes, that's correct but then you seriously risk trashing your data 

The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336. 
___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-03 Thread Adam Boyhan
No problem. They have been around for quite some time now. Even speaking to the 
Ceph engineers over at supermicro while we spec'd our hardware, they agreed as 
well. 

[ https://www.supermicro.com/white_paper/white_paper_Ceph-Ultra.pdf | 
https://www.supermicro.com/white_paper/white_paper_Ceph-Ultra.pdf ] 

[ 
https://www.redhat.com/cms/managed-files/st-micron-ceph-performance-reference-architecture-f17294-201904-en.pdf
 | 
https://www.redhat.com/cms/managed-files/st-micron-ceph-performance-reference-architecture-f17294-201904-en.pdf
 ] 

[ 
https://www.samsung.com/semiconductor/global.semi/file/resource/2020/05/redhat-ceph-whitepaper-0521.pdf
 | 
https://www.samsung.com/semiconductor/global.semi/file/resource/2020/05/redhat-ceph-whitepaper-0521.pdf
 ] 






From: dhils...@performair.com 
To: "adamb"  
Cc: "ceph-users"  
Sent: Wednesday, February 3, 2021 10:57:38 AM 
Subject: RE: Worst thing that can happen if I have size= 2 

Adam; 

I'd like to see that / those white papers. 

I suspect what they're advocating is multiple OSD daemon processes per NVMe 
device. This is something which can improve performance. Though I've never done 
it, I believe you partition the device, and then create your OSD pointing at a 
partition. 

Thank you, 

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc. 
dhils...@performair.com 
www.PerformAir.com 

-Original Message- 
From: Adam Boyhan [mailto:ad...@medent.com] 
Sent: Wednesday, February 3, 2021 8:50 AM 
To: Magnus HAGDORN 
Cc: ceph-users 
Subject: [ceph-users] Re: Worst thing that can happen if I have size= 2 

Isn't this somewhat reliant on the OSD type? 

Redhat/Micron/Samsung/Supermicro have all put out white papers backing the idea 
of 2 copies on NVMe's as safe for production. 


From: "Magnus HAGDORN"  
To: pse...@avalon.org.ua 
Cc: "ceph-users"  
Sent: Wednesday, February 3, 2021 4:43:08 AM 
Subject: [ceph-users] Re: Worst thing that can happen if I have size= 2 

On Wed, 2021-02-03 at 09:39 +, Max Krasilnikov wrote: 
> > if a OSD becomes unavailble (broken disk, rebooting server) then 
> > all 
> > I/O to the PGs stored on that OSD will block until replication 
> > level of 
> > 2 is reached again. So, for a highly available cluster you need a 
> > replication level of 3 
> 
> 
> AFAIK, with min_size 1 it is possible to write even to only active 
> OSD serving 
> 
yes, that's correct but then you seriously risk trashing your data 

The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336. 
___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 
___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] NVMe and 2x Replica

2021-02-04 Thread Adam Boyhan
I know there is already a few threads about 2x replication but I wanted to 
start one dedicated to discussion on NVMe. There are some older threads, but 
nothing recent that addresses how the vendors are now pushing the idea of 2x. 

We are in the process of considering Ceph to replace our Nimble setup. We will 
have two completely separate clusters at two different sites that we are using 
rbd-mirror snapshot replication. The plan would be to run 2x replication on 
each cluster. 3x is still an option, but for obvious reasons 2x is enticing. 

Both clusters will be spot on to the super micro example in the white paper 
below. 

It seems all the big vendors feel 2x is safe with NVMe but I get the feeling 
this community feels otherwise. Trying to wrap my head around were the 
disconnect is between the big players and the community. I could be missing 
something, but even our Supermicro contact that we worked the config out with 
was in agreement with 2x on NVMe. 

Appreciate the input! 

[ https://www.supermicro.com/white_paper/white_paper_Ceph-Ultra.pdf | 
https://www.supermicro.com/white_paper/white_paper_Ceph-Ultra.pdf ] 

[ 
https://www.redhat.com/cms/managed-files/st-micron-ceph-performance-reference-architecture-f17294-201904-en.pdf
 ] 
[ 
https://www.redhat.com/cms/managed-files/st-micron-ceph-performance-reference-architecture-f17294-201904-en.pdf
 | 
https://www.redhat.com/cms/managed-files/st-micron-ceph-performance-reference-architecture-f17294-201904-en.pdf
 ] 

[ 
https://www.samsung.com/semiconductor/global.semi/file/resource/2020/05/redhat-ceph-whitepaper-0521.pdf
 | 
https://www.samsung.com/semiconductor/global.semi/file/resource/2020/05/redhat-ceph-whitepaper-0521.pdf
 ] 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NVMe and 2x Replica

2021-02-04 Thread Adam Boyhan
All great input and points guys. 

Helps me lean towards 3 copes a bit more. 

I mean honestly NVMe cost per TB isn't that much more than SATA SSD now. 
Somewhat surprised the salesmen aren't pitching 3x replication as it makes them 
more money. 



From: "Anthony D'Atri"  
To: "ceph-users"  
Sent: Thursday, February 4, 2021 12:47:27 PM 
Subject: [ceph-users] Re: NVMe and 2x Replica 

> I searched each to find the section where 2x was discussed. What I found was 
> interesting. First, there are really only 2 positions here: Micron's and Red 
> Hat's. Supermicro copies Micron's positon paragraph word for word. Not 
> surprising considering that they are advertising a Supermicro / Micron 
> solution. 

FWIW, at Cephalocon another vendor made a similar claim during a talk. 

* Failure rates are averages, not minima. Some drives will always fail sooner 
* Firmware and other design flaws can result in much higher rates of failure or 
insidious UREs that can result in partial data unavailability or loss 
* Latent soft failures may not be detected until a deep scrub succeeds, which 
could be weeks later 
* In a distributed system, there are up/down/failure scenarios where the 
location of even one good / canonical / latest copy of data is unclear, 
especially when drive or HBA cache is in play. 
* One of these is a power failure. Sure PDU / PSU redundancy helps, but stuff 
happens, like a DC underprovisioning amps, so that a spike in user traffic 
results in the whole row going down :-x Various unpleasant things can happen. 

I was championing R3 even pre-Ceph when I was using ZFS or HBA RAID. As others 
have written, as drives get larger the time to fill them with replica data 
increases, as does the chance of overlapping failures. I’ve experieneced R2 
overlapping failures more than once, with and before Ceph. 

My sense has been that not many people run R2 for data they care about, and as 
has been written recently 2,2 EC is safer with the same raw:usable ratio. I’ve 
figured that vendors make R2 statements like these as a selling point to assert 
lower TCO. My first response is often “How much would it cost you directly, and 
indirectly in terms of user / customer goodwill, to loose data?”. 

> Personally, this looks like marketing BS to me. SSD shops want to sell SSDs, 
> but because of the cost difference they have to convince buyers that their 
> products are competitive. 

^this. I’m watching the QLC arena with interest for the potential to narrow the 
CapEx gap. Durability has been one concern, though I’m seeing newer products 
claiming that eg. ZNS improves that. It also seems that there are something 
like what, *4* separate EDSFF / ruler form factors, I really want to embrace 
those eg. for object clusters, but I’m VERY wary of the longevity of competing 
standards and any single-source for chassies or drives. 

> Our products cost twice as much, but LOOK you only need 2/3 as many, and you 
> get all these other benefits (performance). Plus, if you replace everything 
> in 2 or 3 years anyway, then you won't have to worry about them failing. 

Refresh timelines. You’re funny ;) Every time, every single time, that I’ve 
worked in an organization that claims a 3 (or 5, or whatever) hardware refresh 
cycle, it hasn’t happened. When you start getting close, the capex doesn’t 
materialize, or the opex cost of DC hands and operational oversight. “How do 
you know that the drives will start failing or getting slower? Let’s revisit 
this in 6 months”. Etc. 

___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NVMe and 2x Replica

2021-02-05 Thread Adam Boyhan
This turned into a great thread. Lots of good information and clarification. 

I am 100% on board with 3 copies for the primary. 

What does everyone think about possibly only doing 2 copies on the secondary? 
Keeping in mind that I would keep min=2 which I think will be reasonable for a 
secondary site. 


From: "Frank Schilder"  
To: "Jack" , "ceph-users"  
Sent: Friday, February 5, 2021 7:14:52 AM 
Subject: [ceph-users] Re: NVMe and 2x Replica 

> Picture this, using size=3, min_size=2: 
> - One node is down for maintenance 
> - You loose a couple of devices 
> - You loose data 
> 
> Is it likely that a nvme device dies during a short maintenance window ? 
> Is it likely that two devices dies at the same time ? 

If you just look at it from this narrow point of view of fundamental laws of 
nature, then, yes, 2+1 is safe. As safe as is nuclear power just looking at the 
laws of physics. So why then did Chernobyl and Fukushima happen? Its because 
its operated by humans. If you look around, the No. 1 reason for loosing data 
on ceph or entire clusters is 2+1. 

Look at the reasons. Its rarely a broken disk. A system designed with no 
redundancy that offers a margin for error will suffer from every little admin 
mistake, undetected race condition, bug in ceph or bug in firmware. So, if the 
savings are worth the sweat, downtime and consultancy budget, why not? 

Ceph has infinite uptime. During such a long period, low-probability events 
will happen with probability 1. 

Best regards, 
= 
Frank Schilder 
AIT Risø Campus 
Bygning 109, rum S14 

 
From: Jack  
Sent: 05 February 2021 12:48:33 
To: ceph-users@ceph.io 
Subject: [ceph-users] Re: NVMe and 2x Replica 

At the end, this is nothing but a probability stuff 

Picture this, using size=3, min_size=2: 
- One node is down for maintenance 
- You loose a couple of devices 
- You loose data 

Is it likely that a nvme device dies during a short maintenance window ? 
Is it likely that two devices dies at the same time ? 

What are the numbers ? 

On 2/5/21 12:26 PM, Wido den Hollander wrote: 
> 
> 
> On 04/02/2021 18:57, Adam Boyhan wrote: 
>> All great input and points guys. 
>> 
>> Helps me lean towards 3 copes a bit more. 
>> 
>> I mean honestly NVMe cost per TB isn't that much more than SATA SSD 
>> now. Somewhat surprised the salesmen aren't pitching 3x replication as 
>> it makes them more money. 
> 
> To add to this, I have seen real cases as a Ceph consultant where size=2 
> and min_size=1 on all flash lead to data loss. 
> 
> Picture this: 
> 
> - One node is down (Maintenance, failure, etc, etc) 
> - NVMe device in other node dies 
> - You loose data 
> 
> Although you can bring back the other node which was down but not broken 
> you are missing data. The data on the NVMe devices in there is outdated 
> and thus the PGs will not become active. 
> 
> size=2 is only safe with min_size=2, but that doesn't really provide HA. 
> 
> The same goes with ZFS in mirror, raidz1, etc. If you loose one device 
> the chances are real you loose the other device before the array has 
> healed itself. 
> 
> With Ceph it's slighly more complex, but the same principles apply. 
> 
> No, with NVMe I still would highly advise against using size=2, min_size=1 
> 
> The question is not if you will loose data, but the question is: When 
> will you loose data? Within one year, 2? 3? 10? 
> 
> Wido 
> 
>> 
>> 
>> 
>> From: "Anthony D'Atri"  
>> To: "ceph-users"  
>> Sent: Thursday, February 4, 2021 12:47:27 PM 
>> Subject: [ceph-users] Re: NVMe and 2x Replica 
>> 
>>> I searched each to find the section where 2x was discussed. What I 
>>> found was interesting. First, there are really only 2 positions here: 
>>> Micron's and Red Hat's. Supermicro copies Micron's positon paragraph 
>>> word for word. Not surprising considering that they are advertising a 
>>> Supermicro / Micron solution. 
>> 
>> FWIW, at Cephalocon another vendor made a similar claim during a talk. 
>> 
>> * Failure rates are averages, not minima. Some drives will always fail 
>> sooner 
>> * Firmware and other design flaws can result in much higher rates of 
>> failure or insidious UREs that can result in partial data 
>> unavailability or loss 
>> * Latent soft failures may not be detected until a deep scrub 
>> succeeds, which could be weeks later 
>> * In a distributed system, there are up/down/failure scenarios where 
>> the location of even one good / canonical / latest copy of data is 
>> unclear, especially when drive or HBA c

[ceph-users] Re: NVMe and 2x Replica

2021-02-05 Thread Adam Boyhan
Those are my thoughts as well. We have 40Gbit/s of dedicated dark fiber that we 
manage between the two sites. 



From: "Frank Schilder"  
To: "adamb"  
Cc: "Jack" , "ceph-users"  
Sent: Friday, February 5, 2021 10:19:06 AM 
Subject: Re: [ceph-users] Re: NVMe and 2x Replica 

I don't run a secondary site and don't know if short windows of read-only 
access are terrible. From the data security point of view, min_size 2 is fine. 
Its the min_size 1 that really is dangerous, because it accepts non-redundant 
writes. 

Even if you loose the second site entirely, you can always re-sync from scratch 
- assuming decent network bandwidth. 

Best regards, 
= 
Frank Schilder 
AIT Risø Campus 
Bygning 109, rum S14 

________ 
From: Adam Boyhan  
Sent: 05 February 2021 13:58:34 
To: Frank Schilder 
Cc: Jack; ceph-users 
Subject: Re: [ceph-users] Re: NVMe and 2x Replica 

This turned into a great thread. Lots of good information and clarification. 

I am 100% on board with 3 copies for the primary. 

What does everyone think about possibly only doing 2 copies on the secondary? 
Keeping in mind that I would keep min=2 which I think will be reasonable for a 
secondary site. 

 
From: "Frank Schilder"  
To: "Jack" , "ceph-users"  
Sent: Friday, February 5, 2021 7:14:52 AM 
Subject: [ceph-users] Re: NVMe and 2x Replica 

> Picture this, using size=3, min_size=2: 
> - One node is down for maintenance 
> - You loose a couple of devices 
> - You loose data 
> 
> Is it likely that a nvme device dies during a short maintenance window ? 
> Is it likely that two devices dies at the same time ? 

If you just look at it from this narrow point of view of fundamental laws of 
nature, then, yes, 2+1 is safe. As safe as is nuclear power just looking at the 
laws of physics. So why then did Chernobyl and Fukushima happen? Its because 
its operated by humans. If you look around, the No. 1 reason for loosing data 
on ceph or entire clusters is 2+1. 

Look at the reasons. Its rarely a broken disk. A system designed with no 
redundancy that offers a margin for error will suffer from every little admin 
mistake, undetected race condition, bug in ceph or bug in firmware. So, if the 
savings are worth the sweat, downtime and consultancy budget, why not? 

Ceph has infinite uptime. During such a long period, low-probability events 
will happen with probability 1. 

Best regards, 
= 
Frank Schilder 
AIT Risø Campus 
Bygning 109, rum S14 

 
From: Jack  
Sent: 05 February 2021 12:48:33 
To: ceph-users@ceph.io 
Subject: [ceph-users] Re: NVMe and 2x Replica 

At the end, this is nothing but a probability stuff 

Picture this, using size=3, min_size=2: 
- One node is down for maintenance 
- You loose a couple of devices 
- You loose data 

Is it likely that a nvme device dies during a short maintenance window ? 
Is it likely that two devices dies at the same time ? 

What are the numbers ? 

On 2/5/21 12:26 PM, Wido den Hollander wrote: 
> 
> 
> On 04/02/2021 18:57, Adam Boyhan wrote: 
>> All great input and points guys. 
>> 
>> Helps me lean towards 3 copes a bit more. 
>> 
>> I mean honestly NVMe cost per TB isn't that much more than SATA SSD 
>> now. Somewhat surprised the salesmen aren't pitching 3x replication as 
>> it makes them more money. 
> 
> To add to this, I have seen real cases as a Ceph consultant where size=2 
> and min_size=1 on all flash lead to data loss. 
> 
> Picture this: 
> 
> - One node is down (Maintenance, failure, etc, etc) 
> - NVMe device in other node dies 
> - You loose data 
> 
> Although you can bring back the other node which was down but not broken 
> you are missing data. The data on the NVMe devices in there is outdated 
> and thus the PGs will not become active. 
> 
> size=2 is only safe with min_size=2, but that doesn't really provide HA. 
> 
> The same goes with ZFS in mirror, raidz1, etc. If you loose one device 
> the chances are real you loose the other device before the array has 
> healed itself. 
> 
> With Ceph it's slighly more complex, but the same principles apply. 
> 
> No, with NVMe I still would highly advise against using size=2, min_size=1 
> 
> The question is not if you will loose data, but the question is: When 
> will you loose data? Within one year, 2? 3? 10? 
> 
> Wido 
> 
>> 
>> 
>> 
>> From: "Anthony D'Atri"  
>> To: "ceph-users"  
>> Sent: Thursday, February 4, 2021 12:47:27 PM 
>> Subject: [ceph-users] Re: NVMe and 2x Replica 
>> 
>>> I searched each to find the section where 2x was discussed. What I 
>>> 

[ceph-users] Re: SUSE POC - Dead in the water

2021-02-16 Thread Adam Boyhan
These guys are great. 

[ https://croit.io/ | https://croit.io/ ] 

From: "Schweiss, Chip"  
To: "ceph-users"  
Sent: Tuesday, February 16, 2021 9:42:24 AM 
Subject: [ceph-users] SUSE POC - Dead in the water 

For the past several months I had been building a sizable Ceph cluster that 
will be up to 10PB with between 20 and 40 OSD servers this year. 

A few weeks ago I was informed that SUSE is shutting down SES and will no 
longer be selling it. We haven't licensed our proof of concept cluster 
that is currently at 14 OSD nodes, but it looks like SUSE is not going to 
be the answer here. 

I'm seeking recommendations for consulting help on this project since SUSE 
has let me down. 

I have Ceph installed and operating, however, I've been struggling with 
getting the pool configured properly for CephFS and getting very poor 
performance. The OSD servers have TLC NVMe for DB, and Optane NVMe for 
WAL, so I should be seeing decent performance with the current cluster. 

I'm not opposed to completely switching OS distributions. Ceph on SUSE was 
our first SUSE installation. Almost everything else we run is on CentOS, 
but that may change thanks to IBM cannibalizing CentOS. 

Please reach out to me if you can recommend someone to sell us consulting 
hours and/or a support contract. 

-Chip Schweiss 
chip.schwe...@wustl.edu 
Washington University School of Medicine 
___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Octopus and Snapshot Schedules

2020-10-22 Thread Adam Boyhan
Hey all. 

I was wondering if Ceph Octopus is capable of automating/managing snapshot 
creation/retention and then replication? Ive seen some notes about it, but 
can't seem to find anything solid. 

Open to suggestions as well. Appreciate any input! 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Octopus and Snapshot Schedules

2020-10-23 Thread Adam Boyhan
Care to provide anymore detail? 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Octopus and Snapshot Schedules

2020-10-27 Thread Adam Boyhan
Does anyone else have any suggestions or options outside of a separate 
dedicated OS? Seems like this should be something pretty simple and straight 
forward that Ceph is missing. 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Octopus and Snapshot Schedules

2020-10-27 Thread Adam Boyhan
Benji is independent from Ceph. It utilizes Ceph snapshots to do the backups, 
but it has nothing to do with managing Ceph snapshots. 

I am simply looking for the ability to manage Ceph snapshots. For example. Take 
a snapshot every 30 minutes, keep 8 of those 30 minute snapshots. 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Octopus and Snapshot Schedules

2020-10-27 Thread Adam Boyhan
I thought Octopus brought the new snapshot replication feature to the table? 
Was there issues with it? 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Octopus and Snapshot Schedules

2020-10-27 Thread Adam Boyhan
That is exactly what I am thinking. My mistake, I should have specified RBD. 

Is snapshots scheduling/retention for RBD already in Octopus as well? 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Can Ceph Do The Job?

2020-01-30 Thread Adam Boyhan
We are looking to role out a all flash Ceph cluster as storage for our cloud 
solution. The OSD's will be on slightly slower Micron 5300 PRO's, with WAL/DB 
on Micron 7300 MAX NVMe's. 

My main concern with Ceph being able to fit the bill is its snapshot abilities. 

For each RBD we would like the following snapshots 

8x 30 minute snapshots (latest 4 hours) 

With our current solution (HPE Nimble) we simply pause all write IO on the 10 
minute mark for roughly 2 seconds and then we take a snapshot of the entire 
Nimble volume. Each VM within the Nimble volume is sitting on a Linux Logical 
Volume so its easy for us to take one big snapshot and only get access to a 
specific clients data. 

Are there any options for automating managing/retention of snapshots within 
Ceph besides some bash scripts? Is there anyway to take snapshots of all RBD's 
within a pool at a given time? 

Is there anyone successfully running with this many snapshots? If anyone is 
running a similar setup, would love to hear how your doing it. 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can Ceph Do The Job?

2020-01-30 Thread Adam Boyhan
Its my understanding that pool snapshots would basically require us to be in a 
all or nothing situation were we would have to revert all RBD's in a pool. If 
we could clone a pool snapshot for filesystem level access like a rbd snapshot, 
that would help a ton. 

Thanks, 
Adam Boyhan 
System & Network Administrator 
MEDENT(EMR/EHR) 
15 Hulbert Street - P.O. Box 980 
Auburn, New York 13021 
www.medent.com 
Phone: (315)-255-1751 
Fax: (315)-255-3539 
Cell: (315)-729-2290 
ad...@medent.com 

This message and any attachments may contain information that is protected by 
law as privileged and confidential, and is transmitted for the sole use of the 
intended recipient(s). If you are not the intended recipient, you are hereby 
notified that any use, dissemination, copying or retention of this e-mail or 
the information contained herein is strictly prohibited. If you received this 
e-mail in error, please immediately notify the sender by e-mail, and 
permanently delete this e-mail. 


From: "Janne Johansson"  
To: "adamb"  
Cc: "ceph-users"  
Sent: Thursday, January 30, 2020 10:06:14 AM 
Subject: Re: [ceph-users] Can Ceph Do The Job? 

Den tors 30 jan. 2020 kl 15:29 skrev Adam Boyhan < [ mailto:ad...@medent.com | 
ad...@medent.com ] >: 


We are looking to role out a all flash Ceph cluster as storage for our cloud 
solution. The OSD's will be on slightly slower Micron 5300 PRO's, with WAL/DB 
on Micron 7300 MAX NVMe's. 
My main concern with Ceph being able to fit the bill is its snapshot abilities. 
For each RBD we would like the following snapshots 
8x 30 minute snapshots (latest 4 hours) 
With our current solution (HPE Nimble) we simply pause all write IO on the 10 
minute mark for roughly 2 seconds and then we take a snapshot of the entire 
Nimble volume. Each VM within the Nimble volume is sitting on a Linux Logical 
Volume so its easy for us to take one big snapshot and only get access to a 
specific clients data. 
Are there any options for automating managing/retention of snapshots within 
Ceph besides some bash scripts? Is there anyway to take snapshots of all RBD's 
within a pool at a given time? 




You could make a snapshot of the whole pool, that would cover all RBDs in it I 
gather? 
[ 
https://docs.ceph.com/docs/nautilus/rados/operations/pools/#make-a-snapshot-of-a-pool
 | 
https://docs.ceph.com/docs/nautilus/rados/operations/pools/#make-a-snapshot-of-a-pool
 ] 

But if you need to work in parallel with each snapshot from different times and 
clone them one by one and so forth, doing it per-RBD would be better. 

[ https://docs.ceph.com/docs/nautilus/rbd/rbd-snapshot/ | 
https://docs.ceph.com/docs/nautilus/rbd/rbd-snapshot/ ] 

-- 
May the most significant bit of your life be positive. 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Micron SSD/Basic Config

2020-01-31 Thread Adam Boyhan
Looking to role out a all flash Ceph cluster. Wanted to see if anyone else was 
using Micron drives along with some basic input on my design so far? 

Basic Config 
Ceph OSD Nodes 
8x Supermicro A+ Server 2113S-WTRT 
- AMD EPYC 7601 32 Core 2.2Ghz 
- 256G Ram 
- AOC-S3008L-L8e HBA 
- 10GB SFP+ for client network 
- 40GB QSFP+ for ceph cluster network 

OSD 
10x Micron 5300 PRO 7.68TB in each ceph node 
- 80 total drives across the 8 nodes 

WAL/DB 
5x Micron 7300 MAX NVMe 800GB per Ceph Node 
- Plan on dedicating 1 for each 2 OSD's 

Still thinking out a external monitor node as I have a lot of options, but this 
is a pretty good start. Open to suggestions as well! 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Micron SSD/Basic Config

2020-01-31 Thread Adam Boyhan
Appreciate the input. 

Looking at those articles they make me feel like the 40G they are talking about 
is 4x Bonded 10G connections. 

Im looking at 40Gbps without bonding for throughput. Is that still the same? 

[ https://www.fs.com/products/29126.html | 
https://www.fs.com/products/29126.html ] 

Yep most of this is based on the white paper with a few changes here and there. 



From: "EDH - Manuel Rios"  
To: "adamb" , "ceph-users"  
Sent: Friday, January 31, 2020 8:05:52 AM 
Subject: RE: Micron SSD/Basic Config 

Hmm change 40Gbps to 100Gbps networking. 

40Gbps technology its just a bond of 4x10 Links with some latency due link 
aggregation. 
100 Gbps and 25Gbps got less latency and Good performance. In ceph a 50% of the 
latency comes from Network commits and the other 50% from disk commits. 

A fast graph : 
https://blog.mellanox.com/wp-content/uploads/John-Kim-030416-Fig-3a-1024x747.jpg
 
Article: 
https://blog.mellanox.com/2016/03/25-is-the-new-10-50-is-the-new-40-100-is-the-new-amazing/
 

Micron got their own Whitepaper for CEPH and looks like performs fine. 
https://www.micron.com/-/media/client/global/documents/products/other-documents/micron_9200_max_ceph_12,-d-,2,-d-,8_luminous_bluestore_reference_architecture.pdf?la=en
 


AS your Budget is high, please buy 3 x 1.5K $ nodes for your monitors and you 
Will sleep better. They just need 4 cores / 16GB RAM and 2x128GB SSD or NVME M2 
. 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Micron SSD/Basic Config

2020-01-31 Thread Adam Boyhan
Ok, so 100G seems to be the better choice. I will probably go with some of 
these. 

[ https://www.fs.com/products/75808.html | 
https://www.fs.com/products/75808.html ] 





From: "Paul Emmerich"  
To: "EDH"  
Cc: "adamb" , "ceph-users"  
Sent: Friday, January 31, 2020 8:49:29 AM 
Subject: Re: [ceph-users] Re: Micron SSD/Basic Config 

On Fri, Jan 31, 2020 at 2:06 PM EDH - Manuel Rios 
 wrote: 
> 
> Hmm change 40Gbps to 100Gbps networking. 
> 
> 40Gbps technology its just a bond of 4x10 Links with some latency due link 
> aggregation. 
> 100 Gbps and 25Gbps got less latency and Good performance. In ceph a 50% of 
> the latency comes from Network commits and the other 50% from disk commits. 

40G ethernet is not the same as 4x 10G bond. A bond load balances on a 
per-packet (or well, per flow usually) basis. A 40G link uses all four 
links even for a single packet. 
100G is "just" 4x 25G 

I also wouldn't agree that network and disk latency is a 50/50 split 
in Ceph unless you have some NVRAM disks or something. 

Even for the network speed the processing and queuing in the network 
stack dominates over the serialization delay from a 40G/100G 
difference (4kb at 100G is 320ns, and 800ns at 40G for the 
serialization; I don't have any figures for processing times on 
40/100G ethernet, but 10G fiber is at 300ns, 10G base-t at 2300 
nanoseconds) 

Paul 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io