[ceph-users] Massive slowrequests causes OSD daemon to eat whole RAM

pwoszuk Wed, 05 Jul 2017 02:19:12 -0700

Hello

We have a cluster of 10 ceph servers.

On that cluster there are EC pool with replicated SSD cache tier, usedby OpenStack Cinder for volumes storage for production environment.


From 2 days we observe messages like this in logs:

2017-07-05 10:50:13.451987 osd.114 [WRN] slow request 1165.927215seconds old, received at 2017-07-05 10:30:47.104746:osd_op(osd.130.50779:43441 11.57a05c54rbd_data.5bc14d3135d111a.0000000000000084 [copy-get max 8388608] snapc0=[]ack+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirectede50881) currently waiting for rw locks


in this example:

 * OSD.114 is on HDD backend with EC pool in it
 * OSD.130 is on SSD tier

We've analyzed logs and found, that from the beginning RBD image listedabove [rbd_data.5bc14d3135d111a] causes problem from very beginning.Virtual machine (OpenStack uses Ceph cluster as backend storage forCinder) is DOWN/STOPPED. Our conclusion is that this means that problemlies on cluster, not client side.

This unfortunately results in huge amount of blocked requests and RAMconsumption. In result system restarts OSD daemon, and situation startsto repeat.

We've tried to temporary down problematic OSD's, but problem propagateto different OSD pair.

Using "ceph daemon osd.<ID> dump_ops_in_flight" on problematic OSDScauses OSD to hangand in few minutes down by cluster, with no responsefrom command.


SSD model used for SSD cache tier pool is: SAMSUNG MZ7KM240

Could anyone tell what does those log messages means ? Anyone had such aproblem and could help to diagnose/repair ?


Thanks for any help

-------------------------------------------------
Pawel Woszuk
PSNC, Poznan Supercomputing and Networking Center
Poznań, Poland

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Massive slowrequests causes OSD daemon to eat whole RAM

Reply via email to