[ceph-users] Re: Removing pool in nautilus is incredibly slow

Francois Legrand Fri, 26 Jun 2020 07:57:40 -0700

Thanks. I also added osd_op_queue_cut_off to high in global (as youmentioned in a previous thread that osd and mds should use it).

F.

Le 26/06/2020 à 16:35, Frank Schilder a écrit :

I never tried "prio" out, but the reports I have seen claim that prio is 
inferior.


However, as far as I know it is safe to change these settings. Unfortunately, 
you need to restart services to apply the changes.

Before you do, check if *all* daemons are using the same setting. Contrary to 
the naming (osd_*), this setting applies to all daemons. I added it to the 
global options and, most notably, performance of the MDS was improved a lot.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Francois Legrand <f...@lpnhe.in2p3.fr>
Sent: 26 June 2020 15:03:23
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Removing pool in nautilus is incredibly slow

I changed osd_op_queue_cut_off to high and rebooted all the osds. But
the result is more or less the same (storage is still extremely slow,
2h30 to rdb extract a 64GB image !). The only improvement is that it
seems that degraded pgs have disapeared (which is at least a good
point). It seems that there is a problem in priority of operations.
Thus do you think (and also others on the list) that changing the
osd_op_queue setting could help (change to prio or mclock_client).
What are the risks or secondary effects of trying mclock_client on a
production cluster (is it safe) ?
F.

Le 26/06/2020 à 09:46, Frank Schilder a écrit :

I'm using

osd_op_queue = wpq
osd_op_queue_cut_off = high

and these settings are recommended.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Francois Legrand <f...@lpnhe.in2p3.fr>
Sent: 26 June 2020 09:44:00
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Removing pool in nautilus is incredibly slow

We are now using osd_op_queue = wpq. Maybe returning to prio should help ?
What are you using on your mimic custer ?
F.

Le 25/06/2020 à 19:28, Frank Schilder a écrit :

OK, this *does* sound bad. I would consider this a show stopper for upgrade 
from mimic.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Francois Legrand <f...@lpnhe.in2p3.fr>
Sent: 25 June 2020 19:25:14
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Removing pool in nautilus is incredibly slow

I also had this kind of symptoms with nautilus.
Replacing a failed disk (from cluster ok) generates degraded objects.
Also, we have a proxmox cluster accessing vm images stored in our ceph storage 
with rbd.
Each time I had some operation on the ceph cluster like adding or removing a 
pool, most of our proxmox vms lost contact with their system disk in ceph and 
crashed (or remount system storage in read-only mode). At first I thought it 
was a network problem, but now I am sure that it's related to ceph becoming 
unresponsive during background operations.
For now, proxmox cannot even access ceph storage using rbd (it fails with 
timeout).
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Removing pool in nautilus is incredibly slow

Reply via email to