Are you also seeing osds marking themselves down for a little bit and then
coming back up?  There are 2 very likely problems causing/contributing to
this.  The first is if you are using a lot of snapshots.  Deleting
snapshots is a very expensive operation for your cluster and can cause a
lot of slowness.  The second is PG subfolder splitting.  This will show as
blocked requests and osds marking themselves down and coming back up a
little later without any errors in the log.  I linked a previous thread
where someone was having these problems where both causes were investigated.

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg36923.html

If you have 0.94.9 or 10.2.5 or later, then you can split your PG
subfolders sanely while your osds are temporarily turned off using the
'ceph-objectstore-tool apply-layout-settings'.  There are a lot of ways to
skin the cat of snap trimming, but it depends greatly on your use case.

On Mon, Aug 7, 2017 at 11:49 PM Mclean, Patrick <patrick.mcl...@sony.com>
wrote:

> High CPU utilization and inexplicably slow I/O requests
>
> We have been having similar performance issues across several ceph
> clusters. When all the OSDs are up in the cluster, it can stay HEALTH_OK
> for a while, but eventually performance worsens and becomes (at first
> intermittently, but eventually continually) HEALTH_WARN due to slow I/O
> request blocked for longer than 32 sec. These slow requests are
> accompanied by "currently waiting for rw locks", but we have not found
> any network issue that normally is responsible for this warning.
>
> Examining the individual slow OSDs from `ceph health detail` has been
> unproductive; there don't seem to be any slow disks and if we stop the
> OSD the problem just moves somewhere else.
>
> We also think this trends with increased number of RBDs on the clusters,
> but not necessarily a ton of Ceph I/O. At the same time, user %CPU time
> spikes up to 95-100%, at first frequently and then consistently,
> simultaneously across all cores. We are running 12 OSDs on a 2.2 GHz CPU
> with 6 cores and 64GiB RAM per node.
>
> ceph1 ~ $ sudo ceph status
>     cluster XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
>      health HEALTH_WARN
>             547 requests are blocked > 32 sec
>      monmap e1: 3 mons at
>
> {cephmon1.XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XXX:XXXX/0,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XX:XXXX/0,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XXX:XXXX/0}
>             election epoch 16, quorum 0,1,2
>
> cephmon1.XXXXXXXXXXXXXXXXXXXXXXX,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX
>      osdmap e577122: 72 osds: 68 up, 68 in
>             flags sortbitwise,require_jewel_osds
>       pgmap v6799002: 4096 pgs, 4 pools, 13266 GB data, 11091 kobjects
>             126 TB used, 368 TB / 494 TB avail
>                 4084 active+clean
>                   12 active+clean+scrubbing+deep
>   client io 113 kB/s rd, 11486 B/s wr, 135 op/s rd, 7 op/s wr
>
> ceph1 ~ $ vmstat 5 5
> procs -----------memory---------- ---swap-- -----io---- -system--
> ------cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> id wa st
> 27  1      0 3112660 165544 36261692    0    0   472  1274    0    1 22
> 1 76  1  0
> 25  0      0 3126176 165544 36246508    0    0   858 12692 12122 110478
> 97  2  1  0  0
> 22  0      0 3114284 165544 36258136    0    0     1  6118 9586 118625
> 97  2  1  0  0
> 11  0      0 3096508 165544 36276244    0    0     8  6762 10047 188618
> 89  3  8  0  0
> 18  0      0 2990452 165544 36384048    0    0  1209 21170 11179 179878
> 85  4 11  0  0
>
> There is no apparent memory shortage, and none of the HDDs or SSDs show
> consistently high utilization, slow service times, or any other form of
> hardware saturation, other than user CPU utilization. Can CPU starvation
> be responsible for "waiting for rw locks"?
>
> Our main pool (the one with all the data) currently has 1024 PGs,
> leaving us room to add more PGs if needed, but we're concerned if we do
> so that we'd consume even more CPU.
>
> We have moved to running Ceph + jemalloc instead of tcmalloc, and that
> has helped with CPU utilization somewhat, but we still see occurences of
> 95-100% CPU with not terribly high Ceph workload.
>
> Any suggestions of what else to look at? We have a peculiar use case
> where we have many RBDs but only about 1-5% of them are active at the
> same time, and we're constantly making and expiring RBD snapshots. Could
> this lead to aberrant performance? For instance, is it normal to have
> ~40k snaps still in cached_removed_snaps?
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to