On Fri, Sep 13, 2019 at 10:41 AM Oliver Freyermuth <freyerm...@physik.uni-bonn.de> wrote: > > Am 13.09.19 um 16:30 schrieb Jason Dillaman: > > On Fri, Sep 13, 2019 at 10:17 AM Jason Dillaman <jdill...@redhat.com> wrote: > >> > >> On Fri, Sep 13, 2019 at 10:02 AM Oliver Freyermuth > >> <freyerm...@physik.uni-bonn.de> wrote: > >>> > >>> Dear Jason, > >>> > >>> thanks for the very detailed explanation! This was very instructive. > >>> Sadly, the watchers look correct - see details inline. > >>> > >>> Am 13.09.19 um 15:02 schrieb Jason Dillaman: > >>>> On Thu, Sep 12, 2019 at 9:55 PM Oliver Freyermuth > >>>> <freyerm...@physik.uni-bonn.de> wrote: > >>>>> > >>>>> Dear Jason, > >>>>> > >>>>> thanks for taking care and developing a patch so quickly! > >>>>> > >>>>> I have another strange observation to share. In our test setup, only a > >>>>> single RBD mirroring daemon is running for 51 images. > >>>>> It works fine with a constant stream of 1-2 MB/s, but at some point > >>>>> after roughly 20 hours, _all_ images go to this interesting state: > >>>>> ----------------------------------------- > >>>>> # rbd mirror image status test-vm.XXXXX-disk2 > >>>>> test-vm.XXXXX-disk2: > >>>>> global_id: XXXXXXXXXXXXXXX > >>>>> state: down+replaying > >>>>> description: replaying, master_position=[object_number=14, > >>>>> tag_tid=6, entry_tid=6338], mirror_position=[object_number=14, > >>>>> tag_tid=6, entry_tid=6338], entries_behind_master=0 > >>>>> last_update: 2019-09-13 03:45:43 > >>>>> ----------------------------------------- > >>>>> Running this command several times, I see entry_tid increasing at both > >>>>> ends, so mirroring seems to be working just fine. > >>>>> > >>>>> However: > >>>>> ----------------------------------------- > >>>>> # rbd mirror pool status > >>>>> health: WARNING > >>>>> images: 51 total > >>>>> 51 unknown > >>>>> ----------------------------------------- > >>>>> The health warning is not visible in the dashboard (also not in the > >>>>> mirroring menu), the daemon still seems to be running, dropped nothing > >>>>> in the logs, > >>>>> and claims to be "ok" in the dashboard - it's only that all images show > >>>>> up in unknown state even though all seems to be working fine. > >>>>> > >>>>> Any idea on how to debug this? > >>>>> When I restart the rbd-mirror service, all images come back as green. I > >>>>> already encountered this twice in 3 days. > >>>> > >>>> The dashboard relies on the rbd-mirror daemon to provide it errors and > >>>> warnings. You can see the status reported by rbd-mirror by running > >>>> "ceph service status": > >>>> > >>>> $ ceph service status > >>>> { > >>>> "rbd-mirror": { > >>>> "4152": { > >>>> "status_stamp": "2019-09-13T08:58:41.937491-0400", > >>>> "last_beacon": "2019-09-13T08:58:41.937491-0400", > >>>> "status": { > >>>> "json": > >>>> "{\"1\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_local_count\":1,\"image_remote_count\":1,\"image_warning_count\":0,\"instance_id\":\"4154\",\"leader\":true},\"2\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0,\"instance_id\":\"4156\",\"leader\":true}}" > >>>> } > >>>> } > >>>> } > >>>> } > >>>> > >>>> In your case, most likely it seems like rbd-mirror thinks all is good > >>>> with the world so it's not reporting any errors. > >>> > >>> This is indeed the case: > >>> > >>> # ceph service status > >>> { > >>> "rbd-mirror": { > >>> "84243": { > >>> "status_stamp": "2019-09-13 15:40:01.149815", > >>> "last_beacon": "2019-09-13 15:40:26.151381", > >>> "status": { > >>> "json": > >>> "{\"2\":{\"name\":\"rbd\",\"callouts\":{},\"image_assigned_count\":51,\"image_error_count\":0,\"image_local_count\":51,\"image_remote_count\":51,\"image_warning_count\":0,\"instance_id\":\"84247\",\"leader\":true}}" > >>> } > >>> } > >>> }, > >>> "rgw": { > >>> ... > >>> } > >>> } > >>> > >>>> The "down" state indicates that the rbd-mirror daemon isn't correctly > >>>> watching the "rbd_mirroring" object in the pool. You can see who it > >>>> watching that object by running the "rados" "listwatchers" command: > >>>> > >>>> $ rados -p <pool name> listwatchers rbd_mirroring > >>>> watcher=1.2.3.4:0/199388543 client.4154 cookie=94769010788992 > >>>> watcher=1.2.3.4:0/199388543 client.4154 cookie=94769061031424 > >>>> > >>>> In my case, the "4154" from "client.4154" is the unique global id for > >>>> my connection to the cluster, which relates back to the "ceph service > >>>> status" dump which also shows status by daemon using the unique global > >>>> id. > >>> > >>> Sadly(?), this looks as expected: > >>> > >>> # rados -p rbd listwatchers rbd_mirroring > >>> watcher=10.160.19.240:0/2922488671 client.84247 cookie=139770046978672 > >>> watcher=10.160.19.240:0/2922488671 client.84247 cookie=139771389162560 > >> > >> Hmm, the unique id is different (84243 vs 84247). I wouldn't have > >> expected the global id to have changed. Did you restart the Ceph > >> cluster or MONs? Do you see any "peer assigned me a different > >> global_id" errors in your rbd-mirror logs? > >> > >> I'll open a tracker ticket to fix the "ceph service status", though, > >> since clearly your global id changed but it wasn't noticed by the > >> service daemon status updater. > > > > ... also, can you please provide the output from the following via a > > pastebin link? > > > > # rados -p rbd listomapvals rbd_mirroring > > Of course, here you go: > https://0x0.st/zy8J.txt
Thanks. For the case above of global image id 1a53fafa-37ef-4edf-9633-c2ba3323ed93, the on-disk status shows that it was last updated by client.84247 / nonce 2922488671, which correctly matches your watcher so the status should be "up": status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93 value (232 bytes) : 00000000 01 01 2c 00 00 00 08 17 49 01 00 00 00 00 00 01 |..,.....I.......| <--- "17 49 01 00 00 00 00 00" (84247) is the instance id 00000010 01 01 1c 00 00 00 03 00 00 00 5f a3 31 ae 10 00 |.........._.1...| <--- "5f a3 31 ae" is the nonce (2922488671) 00000020 00 00 02 00 00 00 0a a0 13 f0 00 00 00 00 00 00 |................| <--- "0a a0 13 f0" is the IP address (10.160.9.240) 00000030 00 00 01 01 b0 00 00 00 04 a2 00 00 00 72 65 70 |.............rep| 00000040 6c 61 79 69 6e 67 2c 20 6d 61 73 74 65 72 5f 70 |laying, master_p| 00000050 6f 73 69 74 69 6f 6e 3d 5b 6f 62 6a 65 63 74 5f |osition=[object_| 00000060 6e 75 6d 62 65 72 3d 31 39 2c 20 74 61 67 5f 74 |number=19, tag_t| 00000070 69 64 3d 36 2c 20 65 6e 74 72 79 5f 74 69 64 3d |id=6, entry_tid=| 00000080 32 36 34 34 33 5d 2c 20 6d 69 72 72 6f 72 5f 70 |26443], mirror_p| 00000090 6f 73 69 74 69 6f 6e 3d 5b 6f 62 6a 65 63 74 5f |osition=[object_| 000000a0 6e 75 6d 62 65 72 3d 31 39 2c 20 74 61 67 5f 74 |number=19, tag_t| 000000b0 69 64 3d 36 2c 20 65 6e 74 72 79 5f 74 69 64 3d |id=6, entry_tid=| 000000c0 32 36 34 34 33 5d 2c 20 65 6e 74 72 69 65 73 5f |26443], entries_| 000000d0 62 65 68 69 6e 64 5f 6d 61 73 74 65 72 3d 30 51 |behind_master=0Q| 000000e0 aa 7b 5d 1b 5f 4f 33 00 |.{]._O3.| 000000e8 The only thing I can think of is that somehow the watcher entity instance has a different encoding and its failing a comparison. Can you restart rbd-mirror such that the statuses list "up+replaying" and then run the following? # rados -p rbd getomapval rbd_mirroring status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93 > Cheers, > Oliver > > > > >>> However, the dashboard still shows those images in "unknown", and this > >>> also shows up via command line: > >>> > >>> # rbd mirror pool status > >>> health: WARNING > >>> images: 51 total > >>> 51 unknown > >>> # rbd mirror image status test-vm.physik.uni-bonn.de-disk1 > >>> test-vm.physik.uni-bonn.de-disk2: > >>> global_id: 1a53fafa-37ef-4edf-9633-c2ba3323ed93 > >>> state: down+replaying > >>> description: replaying, master_position=[object_number=18, tag_tid=6, > >>> entry_tid=25202], mirror_position=[object_number=18, tag_tid=6, > >>> entry_tid=25202], entries_behind_master=0 > >>> last_update: 2019-09-13 15:55:15 > >>> > >>> Any ideas on what else could cause this? > >>> > >>> Cheers and thanks, > >>> Oliver > >>> > >>>> > >>>>> Any idea on this (or how I can extract more information)? > >>>>> I fear keeping high-level debug logs active for ~24h is not feasible. > >>>>> > >>>>> Cheers, > >>>>> Oliver > >>>>> > >>>>> > >>>>> On 2019-09-11 19:14, Jason Dillaman wrote: > >>>>>> On Wed, Sep 11, 2019 at 12:57 PM Oliver Freyermuth > >>>>>> <freyerm...@physik.uni-bonn.de> wrote: > >>>>>>> > >>>>>>> Dear Jason, > >>>>>>> > >>>>>>> I played a bit more with rbd mirroring and learned that deleting an > >>>>>>> image at the source (or disabling journaling on it) immediately moves > >>>>>>> the image to trash at the target - > >>>>>>> but setting rbd_mirroring_delete_delay helps to have some more grace > >>>>>>> time to catch human mistakes. > >>>>>>> > >>>>>>> However, I have issues restoring such an image which has been moved > >>>>>>> to trash by the RBD-mirror daemon as user: > >>>>>>> ----------------------------------- > >>>>>>> [root@mon001 ~]# rbd trash ls -la > >>>>>>> ID NAME SOURCE DELETED_AT > >>>>>>> STATUS PARENT > >>>>>>> d4fbe8f63905 test-vm-XXXXXXXXXXXXXXXXXX-disk2 MIRRORING Wed Sep 11 > >>>>>>> 18:43:14 2019 protected until Thu Sep 12 18:43:14 2019 > >>>>>>> [root@mon001 ~]# rbd trash restore --image foo-image d4fbe8f63905 > >>>>>>> rbd: restore error: 2019-09-11 18:50:15.387 7f5fa9590b00 -1 > >>>>>>> librbd::api::Trash: restore: Current trash source: mirroring does not > >>>>>>> match expected: user > >>>>>>> (22) Invalid argument > >>>>>>> ----------------------------------- > >>>>>>> This is issued on the mon, which has the client.admin key, so it > >>>>>>> should not be a permission issue. > >>>>>>> It also fails when I try that in the Dashboard. > >>>>>>> > >>>>>>> Sadly, the error message is not clear enough for me to figure out > >>>>>>> what could be the problem - do you see what I did wrong? > >>>>>> > >>>>>> Good catch, it looks like we accidentally broke this in Nautilus when > >>>>>> image live-migration support was added. I've opened a new tracker > >>>>>> ticket to fix this [1]. > >>>>>> > >>>>>>> Cheers and thanks again, > >>>>>>> Oliver > >>>>>>> > >>>>>>> On 2019-09-10 23:17, Oliver Freyermuth wrote: > >>>>>>>> Dear Jason, > >>>>>>>> > >>>>>>>> On 2019-09-10 23:04, Jason Dillaman wrote: > >>>>>>>>> On Tue, Sep 10, 2019 at 2:08 PM Oliver Freyermuth > >>>>>>>>> <freyerm...@physik.uni-bonn.de> wrote: > >>>>>>>>>> > >>>>>>>>>> Dear Jason, > >>>>>>>>>> > >>>>>>>>>> On 2019-09-10 18:50, Jason Dillaman wrote: > >>>>>>>>>>> On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth > >>>>>>>>>>> <freyerm...@physik.uni-bonn.de> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Dear Cephalopodians, > >>>>>>>>>>>> > >>>>>>>>>>>> I have two questions about RBD mirroring. > >>>>>>>>>>>> > >>>>>>>>>>>> 1) I can not get it to work - my setup is: > >>>>>>>>>>>> > >>>>>>>>>>>> - One cluster holding the live RBD volumes and snapshots, > >>>>>>>>>>>> in pool "rbd", cluster name "ceph", > >>>>>>>>>>>> running latest Mimic. > >>>>>>>>>>>> I ran "rbd mirror pool enable rbd pool" on that cluster > >>>>>>>>>>>> and created a cephx user "rbd_mirror" with (is there a better > >>>>>>>>>>>> way?): > >>>>>>>>>>>> ceph auth get-or-create client.rbd_mirror mon 'allow r' > >>>>>>>>>>>> osd 'allow class-read object_prefix rbd_children, allow pool rbd > >>>>>>>>>>>> r' -o ceph.client.rbd_mirror.keyring --cluster ceph > >>>>>>>>>>>> In that pool, two images have the journaling feature > >>>>>>>>>>>> activated, all others have it disabled still (so I would expect > >>>>>>>>>>>> these two to be mirrored). > >>>>>>>>>>> > >>>>>>>>>>> You can just use "mon 'profile rbd' osd 'profile rbd'" for the > >>>>>>>>>>> caps -- > >>>>>>>>>>> but you definitely need more than read-only permissions to the > >>>>>>>>>>> remote > >>>>>>>>>>> cluster since it needs to be able to create snapshots of remote > >>>>>>>>>>> images > >>>>>>>>>>> and update/trim the image journals. > >>>>>>>>>> > >>>>>>>>>> these profiles really make life a lot easier. I should have > >>>>>>>>>> thought of them rather than "guessing" a potentially good > >>>>>>>>>> configuration... > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> - Another (empty) cluster running latest Nautilus, > >>>>>>>>>>>> cluster name "ceph", pool "rbd". > >>>>>>>>>>>> I've used the dashboard to activate mirroring for the > >>>>>>>>>>>> RBD pool, and then added a peer with cluster name "ceph-virt", > >>>>>>>>>>>> cephx-ID "rbd_mirror", filled in the mons and key created above. > >>>>>>>>>>>> I've then run: > >>>>>>>>>>>> ceph auth get-or-create client.rbd_mirror_backup mon > >>>>>>>>>>>> 'allow r' osd 'allow class-read object_prefix rbd_children, > >>>>>>>>>>>> allow pool rbd rwx' -o client.rbd_mirror_backup.keyring > >>>>>>>>>>>> --cluster ceph > >>>>>>>>>>>> and deployed that key on the rbd-mirror machine, and > >>>>>>>>>>>> started the service with: > >>>>>>>>>>> > >>>>>>>>>>> Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for your > >>>>>>>>>>> caps [1]. > >>>>>>>>>> > >>>>>>>>>> That did the trick (in combination with the above)! > >>>>>>>>>> Again a case of PEBKAC: I should have read the documentation until > >>>>>>>>>> the end, clearly my fault. > >>>>>>>>>> > >>>>>>>>>> It works well now, even though it seems to run a bit slow (~35 > >>>>>>>>>> MB/s for the initial sync when everything is 1 GBit/s), > >>>>>>>>>> but that may also be caused by combination of some very limited > >>>>>>>>>> hardware on the receiving end (which will be scaled up in the > >>>>>>>>>> future). > >>>>>>>>>> A single host with 6 disks, replica 3 and a RAID controller which > >>>>>>>>>> can only do RAID0 and not JBOD is certainly not ideal, so commit > >>>>>>>>>> latency may cause this slow bandwidth. > >>>>>>>>> > >>>>>>>>> You could try increasing "rbd_concurrent_management_ops" from the > >>>>>>>>> default of 10 ops to something higher to attempt to account for the > >>>>>>>>> latency. However, I wouldn't expect near-line speed w/ RBD > >>>>>>>>> mirroring. > >>>>>>>> > >>>>>>>> Thanks - I will play with this option once we have more storage > >>>>>>>> available in the target pool ;-). > >>>>>>>> > >>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> systemctl start > >>>>>>>>>>>> ceph-rbd-mirror@rbd_mirror_backup.service > >>>>>>>>>>>> > >>>>>>>>>>>> After this, everything looks fine: > >>>>>>>>>>>> # rbd mirror pool info > >>>>>>>>>>>> Mode: pool > >>>>>>>>>>>> Peers: > >>>>>>>>>>>> UUID NAME CLIENT > >>>>>>>>>>>> XXXXXXXXXXX ceph-virt > >>>>>>>>>>>> client.rbd_mirror > >>>>>>>>>>>> > >>>>>>>>>>>> The service also seems to start fine, but logs show (debug > >>>>>>>>>>>> rbd_mirror=20): > >>>>>>>>>>>> > >>>>>>>>>>>> rbd::mirror::ClusterWatcher:0x5575e2a7d390 > >>>>>>>>>>>> resolve_peer_config_keys: retrieving config-key: pool_id=2, > >>>>>>>>>>>> pool_name=rbd, peer_uuid=XXXXXXXXXXX > >>>>>>>>>>>> rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: > >>>>>>>>>>>> enter > >>>>>>>>>>>> rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: > >>>>>>>>>>>> restarting failed pool replayer for uuid: XXXXXXXXXXX cluster: > >>>>>>>>>>>> ceph-virt client: client.rbd_mirror > >>>>>>>>>>>> rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: replaying > >>>>>>>>>>>> for uuid: XXXXXXXXXXX cluster: ceph-virt client: > >>>>>>>>>>>> client.rbd_mirror > >>>>>>>>>>>> rbd::mirror::PoolReplayer: 0x5575e2a7da20 init_rados: > >>>>>>>>>>>> error connecting to remote peer uuid: XXXXXXXXXXX cluster: > >>>>>>>>>>>> ceph-virt client: client.rbd_mirror: (95) Operation not supported > >>>>>>>>>>>> rbd::mirror::ServiceDaemon: 0x5575e29c8d70 > >>>>>>>>>>>> add_or_update_callout: pool_id=2, callout_id=2, > >>>>>>>>>>>> callout_level=error, text=unable to connect to remote cluster > >>>>>>>>>>> > >>>>>>>>>>> If it's still broken after fixing your caps above, perhaps > >>>>>>>>>>> increase > >>>>>>>>>>> debugging for "rados", "monc", "auth", and "ms" to see if you can > >>>>>>>>>>> determine the source of the op not supported error. > >>>>>>>>>>> > >>>>>>>>>>>> I already tried storing the ceph.client.rbd_mirror.keyring (i.e. > >>>>>>>>>>>> from the cluster with the live images) on the rbd-mirror machine > >>>>>>>>>>>> explicitly (i.e. not only in mon config storage), > >>>>>>>>>>>> and after doing that: > >>>>>>>>>>>> rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls > >>>>>>>>>>>> works fine. So it's not a connectivity issue. Maybe a permission > >>>>>>>>>>>> issue? Or did I miss something? > >>>>>>>>>>>> > >>>>>>>>>>>> Any idea what "operation not supported" means? > >>>>>>>>>>>> It's unclear to me whether things should work well using Mimic > >>>>>>>>>>>> with Nautilus, and enabling pool mirroring but only having > >>>>>>>>>>>> journaling on for two images is a supported case. > >>>>>>>>>>> > >>>>>>>>>>> Yes and yes. > >>>>>>>>>>> > >>>>>>>>>>>> 2) Since there is a performance drawback (about 2x) for > >>>>>>>>>>>> journaling, is it also possible to only mirror snapshots, and > >>>>>>>>>>>> leave the live volumes alone? > >>>>>>>>>>>> This would cover the common backup usecase before > >>>>>>>>>>>> deferred mirroring is implemented (or is it there already?). > >>>>>>>>>>> > >>>>>>>>>>> This is in-development right now and will hopefully land for the > >>>>>>>>>>> Octopus release. > >>>>>>>>>> > >>>>>>>>>> That would be very cool. Just to clarify: You mean the "real" > >>>>>>>>>> deferred mirroring, not a "snapshot only" mirroring? > >>>>>>>>>> Is it already clear if this will require Octopous (or a later > >>>>>>>>>> release) on both ends, or only on the receiving side? > >>>>>>>>> > >>>>>>>>> I might not be sure what you mean by deferred mirroring. You can > >>>>>>>>> delay > >>>>>>>>> the replay of the journal via the "rbd_mirroring_replay_delay" > >>>>>>>>> configuration option so that your DR site can be X seconds behind > >>>>>>>>> the > >>>>>>>>> primary at a minimum. > >>>>>>>> > >>>>>>>> This is indeed what I was thinking of... > >>>>>>>> > >>>>>>>>> For Octopus we are working on on-demand and > >>>>>>>>> scheduled snapshot mirroring between sites -- no journal is > >>>>>>>>> involved. > >>>>>>>> > >>>>>>>> ... and this is what I was dreaming of. We keep snapshots of VMs to > >>>>>>>> be able to roll them back. > >>>>>>>> We'd like to also keep those snapshots in a separate Ceph instance > >>>>>>>> as an additional safety-net (in addition to an offline backup of > >>>>>>>> those snapshots with Benji backup). > >>>>>>>> It is not (yet) clear to me whether we can pay the "2 x" price for > >>>>>>>> journaling in the long run, so this would be the way to go in case > >>>>>>>> we can't. > >>>>>>>> > >>>>>>>>> > >>>>>>>>>> Since I got you personally, I have two bonus questions. > >>>>>>>>>> > >>>>>>>>>> 1) Your talk: > >>>>>>>>>> > >>>>>>>>>> https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20Recovery%20and%20Ceph%20Block%20Storage-%20Introducing%20Multi-Site%20Mirroring.pdf > >>>>>>>>>> mentions "rbd journal object flush age", which I'd translate > >>>>>>>>>> with something like the "commit" mount option on a classical file > >>>>>>>>>> system - correct? > >>>>>>>>>> I don't find this switch documented anywhere, though - is > >>>>>>>>>> there experience with it / what's the default? > >>>>>>>>> > >>>>>>>>> It's a low-level knob that by default causes the journal to flush > >>>>>>>>> its > >>>>>>>>> pending IO events before it allows the corresponding IO to be issued > >>>>>>>>> against the backing image. Setting it to a value greater that zero > >>>>>>>>> will allow that many seconds of IO events to be batched together in > >>>>>>>>> a > >>>>>>>>> journal append operation and its helpful for high-throughout, small > >>>>>>>>> IO > >>>>>>>>> operations. Of course it turned out that a bug had broken that > >>>>>>>>> option > >>>>>>>>> a while where events would never batch, so a fix is currently > >>>>>>>>> scheduled for backport of all active releases [1] w/ the goal that > >>>>>>>>> no > >>>>>>>>> one should need to tweak it. > >>>>>>>> > >>>>>>>> That's even better - since our setup is growing and we will keep > >>>>>>>> upgrading, I'll then just keep things as they are now (no manual > >>>>>>>> tweaking) > >>>>>>>> and tag along the development. Thanks! > >>>>>>>> > >>>>>>>>> > >>>>>>>>>> 2) I read I can run more than one rbd-mirror with Mimic/Nautilus. > >>>>>>>>>> Do they load-balance the images, or "only" failover in case one of > >>>>>>>>>> them dies? > >>>>>>>>> > >>>>>>>>> Starting with Nautilus, the default configuration for rbd-mirror is > >>>>>>>>> to > >>>>>>>>> evenly divide the number of mirrored images between all running > >>>>>>>>> daemons. This does not split the total load since some images might > >>>>>>>>> be > >>>>>>>>> hotter than others, but it at least spreads the load. > >>>>>>>> > >>>>>>>> That's fine enough for our use case. Spreading by "hotness" is a > >>>>>>>> task without a clear answer > >>>>>>>> and "temperature" may change quickly, so that's all I hoped for. > >>>>>>>> > >>>>>>>> Many thanks again for the very helpful explanations! > >>>>>>>> > >>>>>>>> Cheers, > >>>>>>>> Oliver > >>>>>>>> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Cheers and many thanks for the quick and perfect help! > >>>>>>>>>> Oliver > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> Cheers and thanks in advance, > >>>>>>>>>>>> Oliver > >>>>>>>>>>>> > >>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>> ceph-users mailing list > >>>>>>>>>>>> ceph-users@lists.ceph.com > >>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>>>>>>>>> > >>>>>>>>>>> [1] > >>>>>>>>>>> https://docs.ceph.com/docs/master/rbd/rbd-mirroring/#rbd-mirror-daemon > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Jason > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> [1] https://github.com/ceph/ceph/pull/28539 > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> [1] https://tracker.ceph.com/issues/41780 > >>>>>> > >>>>> > >>>>> > >>>> > >>>> > >>> > >>> > >> > >> > >> -- > >> Jason > > > > > > > > -- Jason _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com