Re: [ceph-users] Replication lag in block storage

Greg Poirier Thu, 13 Mar 2014 17:49:13 -0700

Many of the sub ops look like this, with significant lag between
received_at and commit_sent:


        { "description": "osd_op(client.6869831.0:1192491
rbd_data.67b14a2ae8944a.0000000000009105 [write 507904~3686400] 6.556a4db0
e660)",
          "received_at": "2014-03-13 20:42:05.811936",
          "age": "46.088198",
          "duration": "0.038328",
<snip>
                    { "time": "2014-03-13 20:42:05.850215",
                      "event": "commit_sent"},
                    { "time": "2014-03-13 20:42:05.850264",
                      "event": "done"}]]},

In this case almost 39ms between received_at and commit_sent.

A particularly egregious example of 80+ms lag between received_at and
commit_sent:

       { "description": "osd_op(client.6869831.0:1190526
rbd_data.67b14a2ae8944a.0000000000008fac [write 3325952~868352] 6.5255f5fd
e660)",
          "received_at": "2014-03-13 20:41:40.227813",
          "age": "320.017087",
          "duration": "0.086852",
<snip>
                    { "time": "2014-03-13 20:41:40.314633",
                      "event": "commit_sent"},
                    { "time": "2014-03-13 20:41:40.314665",
                      "event": "done"}]]},



On Thu, Mar 13, 2014 at 4:17 PM, Gregory Farnum <g...@inktank.com> wrote:

> On Thu, Mar 13, 2014 at 3:56 PM, Greg Poirier <greg.poir...@opower.com>
> wrote:
> > We've been seeing this issue on all of our dumpling clusters, and I'm
> > wondering what might be the cause of it.
> >
> > In dump_historic_ops, the time between op_applied and sub_op_commit_rec
> or
> > the time between commit_sent and sub_op_applied is extremely high. Some
> of
> > the osd_sub_ops are as long as 100 ms. A sample dump_historic_ops is
> > included at the bottom.
>
> It's important to understand what each of those timestamps are reporting.
>
> op_applied: the point at which an OSD has applied an operation to its
> readable backing filesystem in-memory (which for xfs or ext4 will be
> after it's committed to the journal)
> sub_op_commit_rec: the point at which an OSD has gotten commits from
> the replica OSDs
> commit_sent: the point at which a replica OSD has sent a commit back
> to its primary
> sub_op_applied: the point at which a replica OSD has applied a
> particular operation to its backing filesystem in-memory (again, after
> the journal if using xfs)
>
> Reads are never served from replicas, so a long time between
> commit_sent and sub_op_applied should not in itself be an issue. A lag
> time between op_applied and sub_op_commit_rec means that the OSD is
> waiting on its replicas. A long time there indicates either that the
> replica is processing slowly, or that there's some issue in the
> communications stack (all the way from the raw ethernet up to the
> message handling in the OSD itself).
> So the first thing to look for are sub ops which have a lag time
> between the received_at and commit_sent timestamps. If none of those
> ever turn up, but unusually long waits for sub_op_commit_rec are still
> present, then it'll take more effort to correlate particular subops on
> replicas with the op on the primary they correspond to, and see where
> the time lag is coming into it.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Replication lag in block storage

Reply via email to