Thanks for your answers,
we will also experiment with osd recovery max active / threads and
will come back to you

Regards,
Kostis

On 16 July 2015 at 12:29, Jan Schermer <j...@schermer.cz> wrote:
> For me setting recovery_delay_start helps during the OSD bootup _sometimes_, 
> but it clearly does something different than what’s in the docs.
>
> Docs say:
> After peering completes, Ceph will delay for the specified number of seconds 
> before starting to recover objects.
>
> However, what I see is greatly slowed recovery, not a delayed start of 
> recovery. It seems to basically sleep between recovering the PGs. AFAIK 
> peering is already done unless I was remapping the PGs at the same moment, so 
> not sure what’s happening there in reality.
>
> We had this set to 20 for some time and recovering after host restart took 
> close to two hours.
> With this parameter set to 0, it recovered in less than 30 seconds (and 
> caused no slow requests or anything).
>
> So what I usually do is set this to a high number (like 200), and after all 
> the OSDs are started I set it to 0. This does not completely prevent slow 
> requests from happening, but does somewhat help…
>
> Jan
>
>> On 15 Jul 2015, at 11:52, Andrey Korolyov <and...@xdel.ru> wrote:
>>
>> On Wed, Jul 15, 2015 at 12:15 PM, Jan Schermer <j...@schermer.cz> wrote:
>>> We have the same problems, we need to start the OSDs slowly.
>>> The problem seems to be CPU congestion. A booting OSD will use all 
>>> available CPU power you give it, and if it doesn’t have enough nasty stuff 
>>> happens (this might actually be the manifestation of some kind of problem 
>>> in our setup as well).
>>> It doesn’t do that always - I was restarting our hosts this weekend and 
>>> most of them came up fine with simple “service ceph start”, some just sat 
>>> there spinning the CPU and not doing any real world (and the cluster was 
>>> not very happy about that).
>>>
>>> Jan
>>>
>>>
>>>> On 15 Jul 2015, at 10:53, Kostis Fardelas <dante1...@gmail.com> wrote:
>>>>
>>>> Hello,
>>>> after some trial and error we concluded that if we start the 6 stopped
>>>> OSD daemons with a delay of 1 minute, we do not experience slow
>>>> requests (threshold is set on 30 sec), althrough there are some ops
>>>> that last up to 10s which is already high enough. I assume that if we
>>>> spread the delay more, the slow requests will vanish. The possibility
>>>> of not having tuned our setup to the most finest detail is not zeroed
>>>> out but I wonder if at any way we miss some ceph tuning in terms of
>>>> ceph configuration.
>>>>
>>>> We run firefly latest stable version.
>>>>
>>>> Regards,
>>>> Kostis
>>>>
>>>> On 13 July 2015 at 13:28, Kostis Fardelas <dante1...@gmail.com> wrote:
>>>>> Hello,
>>>>> after rebooting a ceph node and the OSDs starting booting and joining
>>>>> the cluster, we experience slow requests that get resolved immediately
>>>>> after cluster recovers. It is improtant to note that before the node
>>>>> reboot, we set noout flag in order to prevent recovery - so there are
>>>>> only degraded PGs when OSDs shut down- and let the cluster handle the
>>>>> OSDs down/up in the lightest way.
>>>>>
>>>>> Is there any tunable we should consider in order to avoid service
>>>>> degradation for our ceph clients?
>>>>>
>>>>> Regards,
>>>>> Kostis
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> As far as I`ve seen this problem, the main issue for regular
>> disk-backed OSDs is an IOPS starvation during some interval after
>> reading maps from filestore and marking itself as 'in' - even if
>> in-memory caches are still hot, I/O will significantly degrade for a
>> short period. The possible workaround for an otherwise healthy cluster
>> and node-wide restart is to set norecover flag, it would greatly
>> reduce a chance of hitting slow operations. Of course it is applicable
>> only to non-empty cluster with tens of percents of an average
>> utilization for rotating media. I pointed this issue a couple of years
>> ago first (it *does* break 30s I/O SLA for returning OSD, but
>> refilling same OSDs from scratch would not violate the same SLA,
>> giving out way bigger completion time for a refill). From UX side, it
>> would be great to introduce some kind of recovery throttler for newly
>> started OSDs, as recovery_ delay_start does not prevent immediate
>> recovery procedures.
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to