Many of the sub ops look like this, with significant lag between received_at and commit_sent:
{ "description": "osd_op(client.6869831.0:1192491 rbd_data.67b14a2ae8944a.0000000000009105 [write 507904~3686400] 6.556a4db0 e660)", "received_at": "2014-03-13 20:42:05.811936", "age": "46.088198", "duration": "0.038328", <snip> { "time": "2014-03-13 20:42:05.850215", "event": "commit_sent"}, { "time": "2014-03-13 20:42:05.850264", "event": "done"}]]}, In this case almost 39ms between received_at and commit_sent. A particularly egregious example of 80+ms lag between received_at and commit_sent: { "description": "osd_op(client.6869831.0:1190526 rbd_data.67b14a2ae8944a.0000000000008fac [write 3325952~868352] 6.5255f5fd e660)", "received_at": "2014-03-13 20:41:40.227813", "age": "320.017087", "duration": "0.086852", <snip> { "time": "2014-03-13 20:41:40.314633", "event": "commit_sent"}, { "time": "2014-03-13 20:41:40.314665", "event": "done"}]]}, On Thu, Mar 13, 2014 at 4:17 PM, Gregory Farnum <g...@inktank.com> wrote: > On Thu, Mar 13, 2014 at 3:56 PM, Greg Poirier <greg.poir...@opower.com> > wrote: > > We've been seeing this issue on all of our dumpling clusters, and I'm > > wondering what might be the cause of it. > > > > In dump_historic_ops, the time between op_applied and sub_op_commit_rec > or > > the time between commit_sent and sub_op_applied is extremely high. Some > of > > the osd_sub_ops are as long as 100 ms. A sample dump_historic_ops is > > included at the bottom. > > It's important to understand what each of those timestamps are reporting. > > op_applied: the point at which an OSD has applied an operation to its > readable backing filesystem in-memory (which for xfs or ext4 will be > after it's committed to the journal) > sub_op_commit_rec: the point at which an OSD has gotten commits from > the replica OSDs > commit_sent: the point at which a replica OSD has sent a commit back > to its primary > sub_op_applied: the point at which a replica OSD has applied a > particular operation to its backing filesystem in-memory (again, after > the journal if using xfs) > > Reads are never served from replicas, so a long time between > commit_sent and sub_op_applied should not in itself be an issue. A lag > time between op_applied and sub_op_commit_rec means that the OSD is > waiting on its replicas. A long time there indicates either that the > replica is processing slowly, or that there's some issue in the > communications stack (all the way from the raw ethernet up to the > message handling in the OSD itself). > So the first thing to look for are sub ops which have a lag time > between the received_at and commit_sent timestamps. If none of those > ever turn up, but unusually long waits for sub_op_commit_rec are still > present, then it'll take more effort to correlate particular subops on > replicas with the op on the primary they correspond to, and see where > the time lag is coming into it. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com