Re: [ceph-users] Slow requests from bluestore osds

Marc Schöchlin Sun, 12 May 2019 23:53:38 -0700

Hi Manuel, hello list,

what is the reason for manually compacting the osd?


In past debugging sessions we saw that slow requests appeared in correlation to 
compaction messages of the involved osds.
In this situation we haven't seen this (see logfile below in my last mail).

As described our cluster runs on 12.2.10 and our xen-dom-0 clients mainly 
utilize 10.2.5 with rbd-nbd.

It seems that there were some activities in in improving slow request problems 
in recent ceph releases due to the fact that users report problems in this area 
in the last months.
Can you recommend to update "rbd-rbd" or the cluster itself to improve the 
situation?

Best regards
Marc

Am 13.05.19 um 07:40 schrieb EDH - Manuel Rios Fernandez:
> Hi Marc,
>
> Try to compact OSD with slow request 
>
> ceph tell osd.[ID] compact
>
> This will make the OSD offline for some seconds(SSD) to minutes(HDD) and 
> perform a compact of OMAP database.
>
> Regards,
>
>
>
>
> -----Mensaje original-----
> De: ceph-users <ceph-users-boun...@lists.ceph.com> En nombre de Marc Schöchlin
> Enviado el: lunes, 13 de mayo de 2019 6:59
> Para: ceph-users@lists.ceph.com
> Asunto: Re: [ceph-users] Slow requests from bluestore osds
>
> Hello cephers,
>
> one week ago we replaced the bluestore cache size by "osd memory target" and 
> removed the detail memory settings.
> This storage class now runs 42*8GB spinners with a permanent write workload 
> of 2000-3000 write IOPS, and 1200-8000 read IOPS.
>
> Out new setup is now:
> (12.2.10 on Ubuntu 16.04)
>
> [osd]
> osd deep scrub interval = 2592000
> osd scrub begin hour = 19
> osd scrub end hour = 6
> osd scrub load threshold = 6
> osd scrub sleep = 0.3
> osd snap trim sleep = 0.4
> pg max concurrent snap trims = 1
>
> [osd.51]
> osd memory target = 8589934592
> ...
>
> After that (restarting the entire cluster with these settings) we were very 
> happy to not seeany slow request for 7 days.
>
> Unfortunately this night the slow requests returned on one osd without any 
> known change of the workload of the last 14 days (according to our detailed 
> monitoring)
>
> 2019-05-12 22:00:00.000117 mon.ceph-mon-s43 [INF] overall HEALTH_OK
> 2019-05-12 23:00:00.000130 mon.ceph-mon-s43 [INF] overall HEALTH_OK
> 2019-05-13 00:00:00.000129 mon.ceph-mon-s43 [INF] overall HEALTH_OK
> 2019-05-13 00:00:44.069793 mon.ceph-mon-s43 [WRN] Health check failed: 416 
> slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
> 2019-05-13 00:00:50.151190 mon.ceph-mon-s43 [WRN] Health check update: 439 
> slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
> 2019-05-13 00:00:59.750398 mon.ceph-mon-s43 [WRN] Health check update: 452 
> slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
> 2019-05-13 00:01:04.750697 mon.ceph-mon-s43 [WRN] Health check update: 283 
> slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
> 2019-05-13 00:01:10.419801 mon.ceph-mon-s43 [WRN] Health check update: 230 
> slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
> 2019-05-13 00:01:19.751516 mon.ceph-mon-s43 [WRN] Health check update: 362 
> slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
> 2019-05-13 00:01:24.751822 mon.ceph-mon-s43 [WRN] Health check update: 324 
> slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
> 2019-05-13 00:01:30.675160 mon.ceph-mon-s43 [WRN] Health check update: 341 
> slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
> 2019-05-13 00:01:38.759012 mon.ceph-mon-s43 [WRN] Health check update: 390 
> slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
> 2019-05-13 00:01:44.858392 mon.ceph-mon-s43 [WRN] Health check update: 366 
> slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
> 2019-05-13 00:01:54.753388 mon.ceph-mon-s43 [WRN] Health check update: 352 
> slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
> 2019-05-13 00:01:59.045220 mon.ceph-mon-s43 [INF] Health check cleared: 
> REQUEST_SLOW (was: 168 slow requests are blocked > 32 sec. Implicated osds 51)
> 2019-05-13 00:01:59.045257 mon.ceph-mon-s43 [INF] Cluster is now healthy
> 2019-05-13 01:00:00.000114 mon.ceph-mon-s43 [INF] overall HEALTH_OK
> 2019-05-13 02:00:00.000130 mon.ceph-mon-s43 [INF] overall HEALTH_OK
>
>
> The output of a "ceph health detail" loop at the time the problem occurred:
>
> Mon May 13 00:01:27 CEST 2019
> HEALTH_WARN 324 slow requests are blocked > 32 sec. Implicated osds 51 
> REQUEST_SLOW 324 slow requests are blocked > 32 sec. Implicated osds 51
>     324 ops are blocked > 32.768 sec
>     osd.51 has blocked requests > 32.768 sec
>
> The logfile of the OSD:
>
> 2019-05-12 23:57:28.767463 7f38da4e2700  4 rocksdb: (Original Log Time 
> 2019/05/12-23:57:28.767419) 
> [/build/ceph-12.2.10/src/rocksdb/db/db_impl_compaction_flush.cc:132] 
> [default] Level summary: base level 1 max b ytes base 268435456 files[2 4 21 
> 122 0 0 0] max score 0.94
>
> 2019-05-12 23:57:28.767511 7f38da4e2700  4 rocksdb: 
> [/build/ceph-12.2.10/src/rocksdb/db/db_impl_files.cc:388] [JOB 2991] Try to 
> delete WAL files size 256700142, prev total WAL file size 257271487, number 
> of live
>  WAL files 2.
>
> 2019-05-12 23:58:07.816376 7f38ddce9700  0 log_channel(cluster) log [DBG] : 
> 34.ac scrub ok
> 2019-05-12 23:59:54.070025 7f38de4ea700  0 log_channel(cluster) log [DBG] : 
> 34.236 scrub starts
> 2019-05-13 00:02:21.818689 7f38de4ea700  0 log_channel(cluster) log [DBG] : 
> 34.236 scrub ok
> 2019-05-13 00:04:37.613094 7f38ead03700  4 rocksdb: 
> [/build/ceph-12.2.10/src/rocksdb/db/db_impl_write.cc:684] reusing log 422507 
> from recycle list
>
> 2019-05-13 00:04:37.613186 7f38ead03700  4 rocksdb: 
> [/build/ceph-12.2.10/src/rocksdb/db/db_impl_write.cc:725] [default] New 
> memtable created with log file: #422511. Immutable memtables: 0.
>
> Any hints how to find more details about the origin of this problem?
> How can we solve that?
>
> Regards
> Marc
>
> Am 28.01.19 um 22:27 schrieb Marc Schöchlin:
>> Hello cephers,
>>
>> as described - we also have the slow requests in our setup.
>>
>> We recently updated from ceph 12.2.4 to 12.2.10, updated Ubuntu 16.04 to the 
>> latest patchlevel (with kernel 4.15.0-43) and applied dell firmware 2.8.0.
>>
>> On 12.2.5 (before updating the cluster) we had in a frequency of 10min to 
>> 30minutes in the entire deepscrub-window between 8:00 PM and 6:00 AM.
>> Especially between 04:00AM and 06:00 AM when when we sequentially create a 
>> rbd snapshots for every rbd image and delete a outdated snapshot (we hold 3 
>> snapshots per rbd device).
>>
>> After the upgrade to 12.2.10 (and the other patches) slow requests seems to 
>> be reduced, but they still occur after the snapshot creation/deletion 
>> procedure.
>> Today we changed the time of the creation/deletion procedure from 4:00 AM to 
>> 7:30PM and we experienced slow request right in the the snapshot process at 
>> 8:00PM.
>>
>> The slow requests only happen on a certain storage class osds (30 * 
>> 8GB spinners)  - i.e ssd osds do not have this problem on the same cluster 
>> The pools which use this storage class are loaded by 80% write requests.
>>
>> Our configuration looks like this:
>> ---
>> bluestore cache kv max = 2147483648
>> bluestore cache kv ratio = 0.9
>> bluestore cache meta ratio = 0.1
>> bluestore cache size hdd = 10737418240 osd deep scrub interval = 
>> 2592000 osd scrub begin hour = 19 osd scrub end hour = 6 osd scrub 
>> load threshold = 4 osd scrub sleep = 0.3 osd max trimming pgs = 2
>> ---
>> We do not have so much devices in this storage class (a enhancement is 
>> in progress to get more iops)
>>
>> What can i do to decrease the impact of snaptrims to prevent slow requests?
>> (i.e. reduce "osd max trimming pgs" to "1")
>>
>> Regards
>> Marc Schöchlin
>>
>> Am 03.09.18 um 10:13 schrieb Marc Schöchlin:
>>> Hi,
>>>
>>> we are also experiencing this type of behavior for some weeks on our 
>>> not so performance critical hdd pools.
>>> We haven't spent so much time on this problem, because there are 
>>> currently more important tasks - but here are a few details:
>>>
>>> Running the following loop results in the following output:
>>>
>>> while true; do ceph health|grep -q HEALTH_OK || (date;  ceph health 
>>> detail); sleep 2; done
>>>
>>> Sun Sep  2 20:59:47 CEST 2018
>>> HEALTH_WARN 4 slow requests are blocked > 32 sec REQUEST_SLOW 4 slow 
>>> requests are blocked > 32 sec
>>>     4 ops are blocked > 32.768 sec
>>>     osd.43 has blocked requests > 32.768 sec Sun Sep  2 20:59:50 CEST 
>>> 2018 HEALTH_WARN 4 slow requests are blocked > 32 sec REQUEST_SLOW 4 
>>> slow requests are blocked > 32 sec
>>>     4 ops are blocked > 32.768 sec
>>>     osd.43 has blocked requests > 32.768 sec Sun Sep  2 20:59:52 CEST 
>>> 2018 HEALTH_OK Sun Sep  2 21:00:28 CEST 2018 HEALTH_WARN 1 slow 
>>> requests are blocked > 32 sec REQUEST_SLOW 1 slow requests are 
>>> blocked > 32 sec
>>>     1 ops are blocked > 32.768 sec
>>>     osd.41 has blocked requests > 32.768 sec Sun Sep  2 21:00:31 CEST 
>>> 2018 HEALTH_WARN 7 slow requests are blocked > 32 sec REQUEST_SLOW 7 
>>> slow requests are blocked > 32 sec
>>>     7 ops are blocked > 32.768 sec
>>>     osds 35,41 have blocked requests > 32.768 sec Sun Sep  2 21:00:33 
>>> CEST 2018 HEALTH_WARN 7 slow requests are blocked > 32 sec 
>>> REQUEST_SLOW 7 slow requests are blocked > 32 sec
>>>     7 ops are blocked > 32.768 sec
>>>     osds 35,51 have blocked requests > 32.768 sec Sun Sep  2 21:00:35 
>>> CEST 2018 HEALTH_WARN 7 slow requests are blocked > 32 sec 
>>> REQUEST_SLOW 7 slow requests are blocked > 32 sec
>>>     7 ops are blocked > 32.768 sec
>>>     osds 35,51 have blocked requests > 32.768 sec
>>>
>>> Our details:
>>>
>>>   * system details:
>>>     * Ubuntu 16.04
>>>      * Kernel 4.13.0-39
>>>      * 30 * 8 TB Disk (SEAGATE/ST8000NM0075)
>>>      * 3* Dell Power Edge R730xd (Firmware 2.50.50.50)
>>>        * Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
>>>        * 2*10GBITS SFP+ Network Adapters
>>>        * 192GB RAM
>>>      * Pools are using replication factor 3, 2MB object size,
>>>        85% write load, 1700 write IOPS/sec
>>>        (ops mainly between 4k and 16k size), 300 read IOPS/sec
>>>   * we have the impression that this appears on deepscrub/scrub activity.
>>>   * Ceph 12.2.5, we alread played with the osd settings OSD Settings
>>>     (our assumtion was that the problem is related to rocksdb compaction)
>>>     bluestore cache kv max = 2147483648
>>>     bluestore cache kv ratio = 0.9
>>>     bluestore cache meta ratio = 0.1
>>>     bluestore cache size hdd = 10737418240
>>>   * this type problem only appears on hdd/bluestore osds, ssd/bluestore
>>>     osds did never experienced that problem
>>>   * the system is healthy, no swapping, no high load, no errors in 
>>> dmesg
>>>
>>> I attached a log excerpt of osd.35 - probably this is useful for 
>>> investigating the problem is someone owns deeper bluestore knowledge.
>>> (slow requests appeared on Sun Sep  2 21:00:35)
>>>
>>> Regards
>>> Marc
>>>
>>>
>>> Am 02.09.2018 um 15:50 schrieb Brett Chancellor:
>>>> The warnings look like this.
>>>>
>>>> 6 ops are blocked > 32.768 sec on osd.219
>>>> 1 osds have slow requests
>>>>
>>>> On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza <ad...@redhat.com 
>>>> <mailto:ad...@redhat.com>> wrote:
>>>>
>>>>     On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor
>>>>     <bchancel...@salesforce.com <mailto:bchancel...@salesforce.com>>
>>>>     wrote:
>>>>     > Hi Cephers,
>>>>     >   I am in the process of upgrading a cluster from Filestore to
>>>>     bluestore,
>>>>     > but I'm concerned about frequent warnings popping up against the new
>>>>     > bluestore devices. I'm frequently seeing messages like this,
>>>>     although the
>>>>     > specific osd changes, it's always one of the few hosts I've
>>>>     converted to
>>>>     > bluestore.
>>>>     >
>>>>     > 6 ops are blocked > 32.768 sec on osd.219
>>>>     > 1 osds have slow requests
>>>>     >
>>>>     > I'm running 12.2.4, have any of you seen similar issues? It
>>>>     seems as though
>>>>     > these messages pop up more frequently when one of the bluestore
>>>>     pgs is
>>>>     > involved in a scrub.  I'll include my bluestore creation process
>>>>     below, in
>>>>     > case that might cause an issue. (sdb, sdc, sdd are SATA, sde and
>>>>     sdf are
>>>>     > SSD)
>>>>
>>>>     Would be useful to include what those warnings say. The ceph-volume
>>>>     commands look OK to me
>>>>
>>>>     >
>>>>     >
>>>>     > ## Process used to create osds
>>>>     > sudo ceph-disk zap /dev/sdb /dev/sdc /dev/sdd /dev/sdd /dev/sde
>>>>     /dev/sdf
>>>>     > sudo ceph-volume lvm zap /dev/sdb
>>>>     > sudo ceph-volume lvm zap /dev/sdc
>>>>     > sudo ceph-volume lvm zap /dev/sdd
>>>>     > sudo ceph-volume lvm zap /dev/sde
>>>>     > sudo ceph-volume lvm zap /dev/sdf
>>>>     > sudo sgdisk -n 0:2048:+133GiB -t 0:FFFF -c 1:"ceph block.db sdb"
>>>>     /dev/sdf
>>>>     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 2:"ceph block.db sdc"
>>>>     /dev/sdf
>>>>     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 3:"ceph block.db sdd"
>>>>     /dev/sdf
>>>>     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 4:"ceph block.db sde"
>>>>     /dev/sdf
>>>>     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>>>>     --data
>>>>     > /dev/sdb --block.db /dev/sdf1
>>>>     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>>>>     --data
>>>>     > /dev/sdc --block.db /dev/sdf2
>>>>     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>>>>     --data
>>>>     > /dev/sdd --block.db /dev/sdf3
>>>>     >
>>>>     >
>>>>     > _______________________________________________
>>>>     > ceph-users mailing list
>>>>     > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>     >
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Slow requests from bluestore osds

Reply via email to