Thanks for your answers, we will also experiment with osd recovery max active / threads and will come back to you
Regards, Kostis On 16 July 2015 at 12:29, Jan Schermer <j...@schermer.cz> wrote: > For me setting recovery_delay_start helps during the OSD bootup _sometimes_, > but it clearly does something different than what’s in the docs. > > Docs say: > After peering completes, Ceph will delay for the specified number of seconds > before starting to recover objects. > > However, what I see is greatly slowed recovery, not a delayed start of > recovery. It seems to basically sleep between recovering the PGs. AFAIK > peering is already done unless I was remapping the PGs at the same moment, so > not sure what’s happening there in reality. > > We had this set to 20 for some time and recovering after host restart took > close to two hours. > With this parameter set to 0, it recovered in less than 30 seconds (and > caused no slow requests or anything). > > So what I usually do is set this to a high number (like 200), and after all > the OSDs are started I set it to 0. This does not completely prevent slow > requests from happening, but does somewhat help… > > Jan > >> On 15 Jul 2015, at 11:52, Andrey Korolyov <and...@xdel.ru> wrote: >> >> On Wed, Jul 15, 2015 at 12:15 PM, Jan Schermer <j...@schermer.cz> wrote: >>> We have the same problems, we need to start the OSDs slowly. >>> The problem seems to be CPU congestion. A booting OSD will use all >>> available CPU power you give it, and if it doesn’t have enough nasty stuff >>> happens (this might actually be the manifestation of some kind of problem >>> in our setup as well). >>> It doesn’t do that always - I was restarting our hosts this weekend and >>> most of them came up fine with simple “service ceph start”, some just sat >>> there spinning the CPU and not doing any real world (and the cluster was >>> not very happy about that). >>> >>> Jan >>> >>> >>>> On 15 Jul 2015, at 10:53, Kostis Fardelas <dante1...@gmail.com> wrote: >>>> >>>> Hello, >>>> after some trial and error we concluded that if we start the 6 stopped >>>> OSD daemons with a delay of 1 minute, we do not experience slow >>>> requests (threshold is set on 30 sec), althrough there are some ops >>>> that last up to 10s which is already high enough. I assume that if we >>>> spread the delay more, the slow requests will vanish. The possibility >>>> of not having tuned our setup to the most finest detail is not zeroed >>>> out but I wonder if at any way we miss some ceph tuning in terms of >>>> ceph configuration. >>>> >>>> We run firefly latest stable version. >>>> >>>> Regards, >>>> Kostis >>>> >>>> On 13 July 2015 at 13:28, Kostis Fardelas <dante1...@gmail.com> wrote: >>>>> Hello, >>>>> after rebooting a ceph node and the OSDs starting booting and joining >>>>> the cluster, we experience slow requests that get resolved immediately >>>>> after cluster recovers. It is improtant to note that before the node >>>>> reboot, we set noout flag in order to prevent recovery - so there are >>>>> only degraded PGs when OSDs shut down- and let the cluster handle the >>>>> OSDs down/up in the lightest way. >>>>> >>>>> Is there any tunable we should consider in order to avoid service >>>>> degradation for our ceph clients? >>>>> >>>>> Regards, >>>>> Kostis >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> As far as I`ve seen this problem, the main issue for regular >> disk-backed OSDs is an IOPS starvation during some interval after >> reading maps from filestore and marking itself as 'in' - even if >> in-memory caches are still hot, I/O will significantly degrade for a >> short period. The possible workaround for an otherwise healthy cluster >> and node-wide restart is to set norecover flag, it would greatly >> reduce a chance of hitting slow operations. Of course it is applicable >> only to non-empty cluster with tens of percents of an average >> utilization for rotating media. I pointed this issue a couple of years >> ago first (it *does* break 30s I/O SLA for returning OSD, but >> refilling same OSDs from scratch would not violate the same SLA, >> giving out way bigger completion time for a refill). From UX side, it >> would be great to introduce some kind of recovery throttler for newly >> started OSDs, as recovery_ delay_start does not prevent immediate >> recovery procedures. > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com