On Fri, Sep 13, 2019 at 10:17 AM Jason Dillaman <jdill...@redhat.com> wrote: > > On Fri, Sep 13, 2019 at 10:02 AM Oliver Freyermuth > <freyerm...@physik.uni-bonn.de> wrote: > > > > Dear Jason, > > > > thanks for the very detailed explanation! This was very instructive. > > Sadly, the watchers look correct - see details inline. > > > > Am 13.09.19 um 15:02 schrieb Jason Dillaman: > > > On Thu, Sep 12, 2019 at 9:55 PM Oliver Freyermuth > > > <freyerm...@physik.uni-bonn.de> wrote: > > >> > > >> Dear Jason, > > >> > > >> thanks for taking care and developing a patch so quickly! > > >> > > >> I have another strange observation to share. In our test setup, only a > > >> single RBD mirroring daemon is running for 51 images. > > >> It works fine with a constant stream of 1-2 MB/s, but at some point > > >> after roughly 20 hours, _all_ images go to this interesting state: > > >> ----------------------------------------- > > >> # rbd mirror image status test-vm.XXXXX-disk2 > > >> test-vm.XXXXX-disk2: > > >> global_id: XXXXXXXXXXXXXXX > > >> state: down+replaying > > >> description: replaying, master_position=[object_number=14, tag_tid=6, > > >> entry_tid=6338], mirror_position=[object_number=14, tag_tid=6, > > >> entry_tid=6338], entries_behind_master=0 > > >> last_update: 2019-09-13 03:45:43 > > >> ----------------------------------------- > > >> Running this command several times, I see entry_tid increasing at both > > >> ends, so mirroring seems to be working just fine. > > >> > > >> However: > > >> ----------------------------------------- > > >> # rbd mirror pool status > > >> health: WARNING > > >> images: 51 total > > >> 51 unknown > > >> ----------------------------------------- > > >> The health warning is not visible in the dashboard (also not in the > > >> mirroring menu), the daemon still seems to be running, dropped nothing > > >> in the logs, > > >> and claims to be "ok" in the dashboard - it's only that all images show > > >> up in unknown state even though all seems to be working fine. > > >> > > >> Any idea on how to debug this? > > >> When I restart the rbd-mirror service, all images come back as green. I > > >> already encountered this twice in 3 days. > > > > > > The dashboard relies on the rbd-mirror daemon to provide it errors and > > > warnings. You can see the status reported by rbd-mirror by running > > > "ceph service status": > > > > > > $ ceph service status > > > { > > > "rbd-mirror": { > > > "4152": { > > > "status_stamp": "2019-09-13T08:58:41.937491-0400", > > > "last_beacon": "2019-09-13T08:58:41.937491-0400", > > > "status": { > > > "json": > > > "{\"1\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_local_count\":1,\"image_remote_count\":1,\"image_warning_count\":0,\"instance_id\":\"4154\",\"leader\":true},\"2\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0,\"instance_id\":\"4156\",\"leader\":true}}" > > > } > > > } > > > } > > > } > > > > > > In your case, most likely it seems like rbd-mirror thinks all is good > > > with the world so it's not reporting any errors. > > > > This is indeed the case: > > > > # ceph service status > > { > > "rbd-mirror": { > > "84243": { > > "status_stamp": "2019-09-13 15:40:01.149815", > > "last_beacon": "2019-09-13 15:40:26.151381", > > "status": { > > "json": > > "{\"2\":{\"name\":\"rbd\",\"callouts\":{},\"image_assigned_count\":51,\"image_error_count\":0,\"image_local_count\":51,\"image_remote_count\":51,\"image_warning_count\":0,\"instance_id\":\"84247\",\"leader\":true}}" > > } > > } > > }, > > "rgw": { > > ... > > } > > } > > > > > The "down" state indicates that the rbd-mirror daemon isn't correctly > > > watching the "rbd_mirroring" object in the pool. You can see who it > > > watching that object by running the "rados" "listwatchers" command: > > > > > > $ rados -p <pool name> listwatchers rbd_mirroring > > > watcher=1.2.3.4:0/199388543 client.4154 cookie=94769010788992 > > > watcher=1.2.3.4:0/199388543 client.4154 cookie=94769061031424 > > > > > > In my case, the "4154" from "client.4154" is the unique global id for > > > my connection to the cluster, which relates back to the "ceph service > > > status" dump which also shows status by daemon using the unique global > > > id. > > > > Sadly(?), this looks as expected: > > > > # rados -p rbd listwatchers rbd_mirroring > > watcher=10.160.19.240:0/2922488671 client.84247 cookie=139770046978672 > > watcher=10.160.19.240:0/2922488671 client.84247 cookie=139771389162560 > > Hmm, the unique id is different (84243 vs 84247). I wouldn't have > expected the global id to have changed. Did you restart the Ceph > cluster or MONs? Do you see any "peer assigned me a different > global_id" errors in your rbd-mirror logs? > > I'll open a tracker ticket to fix the "ceph service status", though, > since clearly your global id changed but it wasn't noticed by the > service daemon status updater.
... also, can you please provide the output from the following via a pastebin link? # rados -p rbd listomapvals rbd_mirroring > > However, the dashboard still shows those images in "unknown", and this also > > shows up via command line: > > > > # rbd mirror pool status > > health: WARNING > > images: 51 total > > 51 unknown > > # rbd mirror image status test-vm.physik.uni-bonn.de-disk1 > > test-vm.physik.uni-bonn.de-disk2: > > global_id: 1a53fafa-37ef-4edf-9633-c2ba3323ed93 > > state: down+replaying > > description: replaying, master_position=[object_number=18, tag_tid=6, > > entry_tid=25202], mirror_position=[object_number=18, tag_tid=6, > > entry_tid=25202], entries_behind_master=0 > > last_update: 2019-09-13 15:55:15 > > > > Any ideas on what else could cause this? > > > > Cheers and thanks, > > Oliver > > > > > > > >> Any idea on this (or how I can extract more information)? > > >> I fear keeping high-level debug logs active for ~24h is not feasible. > > >> > > >> Cheers, > > >> Oliver > > >> > > >> > > >> On 2019-09-11 19:14, Jason Dillaman wrote: > > >>> On Wed, Sep 11, 2019 at 12:57 PM Oliver Freyermuth > > >>> <freyerm...@physik.uni-bonn.de> wrote: > > >>>> > > >>>> Dear Jason, > > >>>> > > >>>> I played a bit more with rbd mirroring and learned that deleting an > > >>>> image at the source (or disabling journaling on it) immediately moves > > >>>> the image to trash at the target - > > >>>> but setting rbd_mirroring_delete_delay helps to have some more grace > > >>>> time to catch human mistakes. > > >>>> > > >>>> However, I have issues restoring such an image which has been moved to > > >>>> trash by the RBD-mirror daemon as user: > > >>>> ----------------------------------- > > >>>> [root@mon001 ~]# rbd trash ls -la > > >>>> ID NAME SOURCE DELETED_AT > > >>>> STATUS PARENT > > >>>> d4fbe8f63905 test-vm-XXXXXXXXXXXXXXXXXX-disk2 MIRRORING Wed Sep 11 > > >>>> 18:43:14 2019 protected until Thu Sep 12 18:43:14 2019 > > >>>> [root@mon001 ~]# rbd trash restore --image foo-image d4fbe8f63905 > > >>>> rbd: restore error: 2019-09-11 18:50:15.387 7f5fa9590b00 -1 > > >>>> librbd::api::Trash: restore: Current trash source: mirroring does not > > >>>> match expected: user > > >>>> (22) Invalid argument > > >>>> ----------------------------------- > > >>>> This is issued on the mon, which has the client.admin key, so it > > >>>> should not be a permission issue. > > >>>> It also fails when I try that in the Dashboard. > > >>>> > > >>>> Sadly, the error message is not clear enough for me to figure out what > > >>>> could be the problem - do you see what I did wrong? > > >>> > > >>> Good catch, it looks like we accidentally broke this in Nautilus when > > >>> image live-migration support was added. I've opened a new tracker > > >>> ticket to fix this [1]. > > >>> > > >>>> Cheers and thanks again, > > >>>> Oliver > > >>>> > > >>>> On 2019-09-10 23:17, Oliver Freyermuth wrote: > > >>>>> Dear Jason, > > >>>>> > > >>>>> On 2019-09-10 23:04, Jason Dillaman wrote: > > >>>>>> On Tue, Sep 10, 2019 at 2:08 PM Oliver Freyermuth > > >>>>>> <freyerm...@physik.uni-bonn.de> wrote: > > >>>>>>> > > >>>>>>> Dear Jason, > > >>>>>>> > > >>>>>>> On 2019-09-10 18:50, Jason Dillaman wrote: > > >>>>>>>> On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth > > >>>>>>>> <freyerm...@physik.uni-bonn.de> wrote: > > >>>>>>>>> > > >>>>>>>>> Dear Cephalopodians, > > >>>>>>>>> > > >>>>>>>>> I have two questions about RBD mirroring. > > >>>>>>>>> > > >>>>>>>>> 1) I can not get it to work - my setup is: > > >>>>>>>>> > > >>>>>>>>> - One cluster holding the live RBD volumes and snapshots, > > >>>>>>>>> in pool "rbd", cluster name "ceph", > > >>>>>>>>> running latest Mimic. > > >>>>>>>>> I ran "rbd mirror pool enable rbd pool" on that cluster > > >>>>>>>>> and created a cephx user "rbd_mirror" with (is there a better > > >>>>>>>>> way?): > > >>>>>>>>> ceph auth get-or-create client.rbd_mirror mon 'allow r' > > >>>>>>>>> osd 'allow class-read object_prefix rbd_children, allow pool rbd > > >>>>>>>>> r' -o ceph.client.rbd_mirror.keyring --cluster ceph > > >>>>>>>>> In that pool, two images have the journaling feature > > >>>>>>>>> activated, all others have it disabled still (so I would expect > > >>>>>>>>> these two to be mirrored). > > >>>>>>>> > > >>>>>>>> You can just use "mon 'profile rbd' osd 'profile rbd'" for the > > >>>>>>>> caps -- > > >>>>>>>> but you definitely need more than read-only permissions to the > > >>>>>>>> remote > > >>>>>>>> cluster since it needs to be able to create snapshots of remote > > >>>>>>>> images > > >>>>>>>> and update/trim the image journals. > > >>>>>>> > > >>>>>>> these profiles really make life a lot easier. I should have thought > > >>>>>>> of them rather than "guessing" a potentially good configuration... > > >>>>>>> > > >>>>>>>> > > >>>>>>>>> - Another (empty) cluster running latest Nautilus, cluster > > >>>>>>>>> name "ceph", pool "rbd". > > >>>>>>>>> I've used the dashboard to activate mirroring for the RBD > > >>>>>>>>> pool, and then added a peer with cluster name "ceph-virt", > > >>>>>>>>> cephx-ID "rbd_mirror", filled in the mons and key created above. > > >>>>>>>>> I've then run: > > >>>>>>>>> ceph auth get-or-create client.rbd_mirror_backup mon > > >>>>>>>>> 'allow r' osd 'allow class-read object_prefix rbd_children, allow > > >>>>>>>>> pool rbd rwx' -o client.rbd_mirror_backup.keyring --cluster ceph > > >>>>>>>>> and deployed that key on the rbd-mirror machine, and > > >>>>>>>>> started the service with: > > >>>>>>>> > > >>>>>>>> Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for your > > >>>>>>>> caps [1]. > > >>>>>>> > > >>>>>>> That did the trick (in combination with the above)! > > >>>>>>> Again a case of PEBKAC: I should have read the documentation until > > >>>>>>> the end, clearly my fault. > > >>>>>>> > > >>>>>>> It works well now, even though it seems to run a bit slow (~35 MB/s > > >>>>>>> for the initial sync when everything is 1 GBit/s), > > >>>>>>> but that may also be caused by combination of some very limited > > >>>>>>> hardware on the receiving end (which will be scaled up in the > > >>>>>>> future). > > >>>>>>> A single host with 6 disks, replica 3 and a RAID controller which > > >>>>>>> can only do RAID0 and not JBOD is certainly not ideal, so commit > > >>>>>>> latency may cause this slow bandwidth. > > >>>>>> > > >>>>>> You could try increasing "rbd_concurrent_management_ops" from the > > >>>>>> default of 10 ops to something higher to attempt to account for the > > >>>>>> latency. However, I wouldn't expect near-line speed w/ RBD mirroring. > > >>>>> > > >>>>> Thanks - I will play with this option once we have more storage > > >>>>> available in the target pool ;-). > > >>>>> > > >>>>>> > > >>>>>>>> > > >>>>>>>>> systemctl start ceph-rbd-mirror@rbd_mirror_backup.service > > >>>>>>>>> > > >>>>>>>>> After this, everything looks fine: > > >>>>>>>>> # rbd mirror pool info > > >>>>>>>>> Mode: pool > > >>>>>>>>> Peers: > > >>>>>>>>> UUID NAME CLIENT > > >>>>>>>>> XXXXXXXXXXX ceph-virt > > >>>>>>>>> client.rbd_mirror > > >>>>>>>>> > > >>>>>>>>> The service also seems to start fine, but logs show (debug > > >>>>>>>>> rbd_mirror=20): > > >>>>>>>>> > > >>>>>>>>> rbd::mirror::ClusterWatcher:0x5575e2a7d390 > > >>>>>>>>> resolve_peer_config_keys: retrieving config-key: pool_id=2, > > >>>>>>>>> pool_name=rbd, peer_uuid=XXXXXXXXXXX > > >>>>>>>>> rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: > > >>>>>>>>> enter > > >>>>>>>>> rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: > > >>>>>>>>> restarting failed pool replayer for uuid: XXXXXXXXXXX cluster: > > >>>>>>>>> ceph-virt client: client.rbd_mirror > > >>>>>>>>> rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: replaying > > >>>>>>>>> for uuid: XXXXXXXXXXX cluster: ceph-virt client: client.rbd_mirror > > >>>>>>>>> rbd::mirror::PoolReplayer: 0x5575e2a7da20 init_rados: error > > >>>>>>>>> connecting to remote peer uuid: XXXXXXXXXXX cluster: ceph-virt > > >>>>>>>>> client: client.rbd_mirror: (95) Operation not supported > > >>>>>>>>> rbd::mirror::ServiceDaemon: 0x5575e29c8d70 > > >>>>>>>>> add_or_update_callout: pool_id=2, callout_id=2, > > >>>>>>>>> callout_level=error, text=unable to connect to remote cluster > > >>>>>>>> > > >>>>>>>> If it's still broken after fixing your caps above, perhaps increase > > >>>>>>>> debugging for "rados", "monc", "auth", and "ms" to see if you can > > >>>>>>>> determine the source of the op not supported error. > > >>>>>>>> > > >>>>>>>>> I already tried storing the ceph.client.rbd_mirror.keyring (i.e. > > >>>>>>>>> from the cluster with the live images) on the rbd-mirror machine > > >>>>>>>>> explicitly (i.e. not only in mon config storage), > > >>>>>>>>> and after doing that: > > >>>>>>>>> rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls > > >>>>>>>>> works fine. So it's not a connectivity issue. Maybe a permission > > >>>>>>>>> issue? Or did I miss something? > > >>>>>>>>> > > >>>>>>>>> Any idea what "operation not supported" means? > > >>>>>>>>> It's unclear to me whether things should work well using Mimic > > >>>>>>>>> with Nautilus, and enabling pool mirroring but only having > > >>>>>>>>> journaling on for two images is a supported case. > > >>>>>>>> > > >>>>>>>> Yes and yes. > > >>>>>>>> > > >>>>>>>>> 2) Since there is a performance drawback (about 2x) for > > >>>>>>>>> journaling, is it also possible to only mirror snapshots, and > > >>>>>>>>> leave the live volumes alone? > > >>>>>>>>> This would cover the common backup usecase before deferred > > >>>>>>>>> mirroring is implemented (or is it there already?). > > >>>>>>>> > > >>>>>>>> This is in-development right now and will hopefully land for the > > >>>>>>>> Octopus release. > > >>>>>>> > > >>>>>>> That would be very cool. Just to clarify: You mean the "real" > > >>>>>>> deferred mirroring, not a "snapshot only" mirroring? > > >>>>>>> Is it already clear if this will require Octopous (or a later > > >>>>>>> release) on both ends, or only on the receiving side? > > >>>>>> > > >>>>>> I might not be sure what you mean by deferred mirroring. You can > > >>>>>> delay > > >>>>>> the replay of the journal via the "rbd_mirroring_replay_delay" > > >>>>>> configuration option so that your DR site can be X seconds behind the > > >>>>>> primary at a minimum. > > >>>>> > > >>>>> This is indeed what I was thinking of... > > >>>>> > > >>>>>> For Octopus we are working on on-demand and > > >>>>>> scheduled snapshot mirroring between sites -- no journal is involved. > > >>>>> > > >>>>> ... and this is what I was dreaming of. We keep snapshots of VMs to > > >>>>> be able to roll them back. > > >>>>> We'd like to also keep those snapshots in a separate Ceph instance as > > >>>>> an additional safety-net (in addition to an offline backup of those > > >>>>> snapshots with Benji backup). > > >>>>> It is not (yet) clear to me whether we can pay the "2 x" price for > > >>>>> journaling in the long run, so this would be the way to go in case we > > >>>>> can't. > > >>>>> > > >>>>>> > > >>>>>>> Since I got you personally, I have two bonus questions. > > >>>>>>> > > >>>>>>> 1) Your talk: > > >>>>>>> > > >>>>>>> https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20Recovery%20and%20Ceph%20Block%20Storage-%20Introducing%20Multi-Site%20Mirroring.pdf > > >>>>>>> mentions "rbd journal object flush age", which I'd translate > > >>>>>>> with something like the "commit" mount option on a classical file > > >>>>>>> system - correct? > > >>>>>>> I don't find this switch documented anywhere, though - is > > >>>>>>> there experience with it / what's the default? > > >>>>>> > > >>>>>> It's a low-level knob that by default causes the journal to flush its > > >>>>>> pending IO events before it allows the corresponding IO to be issued > > >>>>>> against the backing image. Setting it to a value greater that zero > > >>>>>> will allow that many seconds of IO events to be batched together in a > > >>>>>> journal append operation and its helpful for high-throughout, small > > >>>>>> IO > > >>>>>> operations. Of course it turned out that a bug had broken that option > > >>>>>> a while where events would never batch, so a fix is currently > > >>>>>> scheduled for backport of all active releases [1] w/ the goal that no > > >>>>>> one should need to tweak it. > > >>>>> > > >>>>> That's even better - since our setup is growing and we will keep > > >>>>> upgrading, I'll then just keep things as they are now (no manual > > >>>>> tweaking) > > >>>>> and tag along the development. Thanks! > > >>>>> > > >>>>>> > > >>>>>>> 2) I read I can run more than one rbd-mirror with Mimic/Nautilus. > > >>>>>>> Do they load-balance the images, or "only" failover in case one of > > >>>>>>> them dies? > > >>>>>> > > >>>>>> Starting with Nautilus, the default configuration for rbd-mirror is > > >>>>>> to > > >>>>>> evenly divide the number of mirrored images between all running > > >>>>>> daemons. This does not split the total load since some images might > > >>>>>> be > > >>>>>> hotter than others, but it at least spreads the load. > > >>>>> > > >>>>> That's fine enough for our use case. Spreading by "hotness" is a task > > >>>>> without a clear answer > > >>>>> and "temperature" may change quickly, so that's all I hoped for. > > >>>>> > > >>>>> Many thanks again for the very helpful explanations! > > >>>>> > > >>>>> Cheers, > > >>>>> Oliver > > >>>>> > > >>>>>> > > >>>>>>> > > >>>>>>> Cheers and many thanks for the quick and perfect help! > > >>>>>>> Oliver > > >>>>>>> > > >>>>>>>> > > >>>>>>>>> Cheers and thanks in advance, > > >>>>>>>>> Oliver > > >>>>>>>>> > > >>>>>>>>> _______________________________________________ > > >>>>>>>>> ceph-users mailing list > > >>>>>>>>> ceph-users@lists.ceph.com > > >>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >>>>>>>> > > >>>>>>>> [1] > > >>>>>>>> https://docs.ceph.com/docs/master/rbd/rbd-mirroring/#rbd-mirror-daemon > > >>>>>>>> > > >>>>>>>> -- > > >>>>>>>> Jason > > >>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>> > > >>>>>> [1] https://github.com/ceph/ceph/pull/28539 > > >>>>>> > > >>>>> > > >>>>> > > >>>> > > >>>> > > >>> > > >>> [1] https://tracker.ceph.com/issues/41780 > > >>> > > >> > > >> > > > > > > > > > > > > > -- > Jason -- Jason _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com