Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman <jdill...@redhat.com>:

> Any chance your rbd-mirror daemon has the admin sockets available
> (defaults to /var/run/ceph/cephdr-client.<id>.<pid>.<random>.asok)? If
> so, you can run "ceph --admin-daemon /path/to/asok rbd mirror status".
>

{
    "pool_replayers": [
        {
            "pool": "glance",
            "peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00 cluster:
production client: client.productionbackup",
            "instance_id": "869081",
            "leader_instance_id": "869081",
            "leader": true,
            "instances": [],
            "local_cluster_admin_socket":
"/var/run/ceph/client.backup.1936211.backup.94225674131712.asok",
            "remote_cluster_admin_socket":
"/var/run/ceph/client.productionbackup.1936211.production.94225675210000.asok",
            "sync_throttler": {
                "max_parallel_syncs": 5,
                "running_syncs": 0,
                "waiting_syncs": 0
            },
            "image_replayers": [
                {
                    "name": "glance/ea5e4ad2-090a-4665-b142-5c7a095963e0",
                    "state": "Replaying"
                },
                {
                    "name": "glance/d7095183-45ef-40b5-80ef-f7c9d3bb1e62",
                    "state": "Replaying"
                },
-------------------cut----------
                {
                    "name":
"cinder/volume-bcb41f46-3716-4ee2-aa19-6fbc241fbf05",
                    "state": "Replaying"
                }
            ]
        },
         {
            "pool": "nova",
            "peer": "uuid: 1fc7fefc-9bcb-4f36-a259-66c3d8086702 cluster:
production client: client.productionbackup",
            "instance_id": "889074",
            "leader_instance_id": "889074",
            "leader": true,
            "instances": [],
            "local_cluster_admin_socket":
"/var/run/ceph/client.backup.1936211.backup.94225678548048.asok",
            "remote_cluster_admin_socket":
"/var/run/ceph/client.productionbackup.1936211.production.94225679621728.asok",
            "sync_throttler": {
                "max_parallel_syncs": 5,
                "running_syncs": 0,
                "waiting_syncs": 0
            },
            "image_replayers": []
        }
    ],
    "image_deleter": {
        "image_deleter_status": {
            "delete_images_queue": [
                {
                    "local_pool_id": 3,
                    "global_image_id":
"ff531159-de6f-4324-a022-50c079dedd45"
                }
            ],
            "failed_deletes_queue": []
        }

>
> On Tue, Apr 9, 2019 at 11:26 AM Magnus Grönlund <mag...@gronlund.se>
> wrote:
> >
> >
> >
> > Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman <jdill...@redhat.com>:
> >>
> >> On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund <mag...@gronlund.se>
> wrote:
> >> >
> >> > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund <mag...@gronlund.se>
> wrote:
> >> > >>
> >> > >> Hi,
> >> > >> We have configured one-way replication of pools between a
> production cluster and a backup cluster. But unfortunately the rbd-mirror
> or the backup cluster is unable to keep up with the production cluster so
> the replication fails to reach replaying state.
> >> > >
> >> > >Hmm, it's odd that they don't at least reach the replaying state. Are
> >> > >they still performing the initial sync?
> >> >
> >> > There are three pools we try to mirror, (glance, cinder, and nova, no
> points for guessing what the cluster is used for :) ),
> >> > the glance and cinder pools are smaller and sees limited write
> activity, and the mirroring works, the nova pool which is the largest and
> has 90% of the write activity never leaves the "unknown" state.
> >> >
> >> > # rbd mirror pool status cinder
> >> > health: OK
> >> > images: 892 total
> >> >     890 replaying
> >> >     2 stopped
> >> > #
> >> > # rbd mirror pool status nova
> >> > health: WARNING
> >> > images: 2479 total
> >> >     2479 unknown
> >> > #
> >> > The production clsuter has 5k writes/s on average and the backup
> cluster has 1-2k writes/s on average. The production cluster is bigger and
> has better specs. I thought that the backup cluster would be able to keep
> up but it looks like I was wrong.
> >>
> >> The fact that they are in the unknown state just means that the remote
> >> "rbd-mirror" daemon hasn't started any journal replayers against the
> >> images. If it couldn't keep up, it would still report a status of
> >> "up+replaying". What Ceph release are you running on your backup
> >> cluster?
> >>
> > The backup cluster is running Luminous 12.2.11 (the production cluster
> 12.2.10)
> >
> >>
> >> > >> And the journals on the rbd volumes keep growing...
> >> > >>
> >> > >> Is it enough to simply disable the mirroring of the pool  (rbd
> mirror pool disable <pool>) and that will remove the lagging reader from
> the journals and shrink them, or is there anything else that has to be done?
> >> > >
> >> > >You can either disable the journaling feature on the image(s) since
> >> > >there is no point to leave it on if you aren't using mirroring, or
> run
> >> > >"rbd mirror pool disable <pool>" to purge the journals.
> >> >
> >> > Thanks for the confirmation.
> >> > I will stop the mirror of the nova pool and try to figure out if
> there is anything we can do to get the backup cluster to keep up.
> >> >
> >> > >> Best regards
> >> > >> /Magnus
> >> > >> _______________________________________________
> >> > >> ceph-users mailing list
> >> > >> ceph-users@lists.ceph.com
> >> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> > >
> >> > >--
> >> > >Jason
> >>
> >>
> >>
> >> --
> >> Jason
>
>
>
> --
> Jason
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to