Just so I understand, the duration for this operation is 329 seconds (a lot
!) but all the reported events happened ~ at the same time (2019-07-20
23:13:18)
Were all the events of this ops reported ?

Why do you see a problem with the "waiting for subops from 4" event ?

Thanks, Massimo

On Wed, Jul 24, 2019 at 9:31 AM Wido den Hollander <w...@42on.com> wrote:

>
>
> On 7/20/19 6:06 PM, Wei Zhao wrote:
> > Hi ceph users:
> > I was doing  write benchmark, and found some io will be blocked for a
> > very long time. The following log is one op , it seems to wait for
> > replica to finish. My ceph version is 12.2.4, and the pool is 3+2 EC .
> > Does anyone give me some adives about how I sould debug next ?
> >
> > {
> >     "ops": [
> >         {
> >             "description": "osd_op(client.17985.0:670679 39.18
> >
> 39:1a63fc5c:::benchmark_data_SH-IDC1-10-5-37-174_2917453_object670678:head
> > [set-alloc-hint object_size 1048576 write_size 1048576,write
> > 0~1048576] snapc 0=[] ondisk+write+known_if_redirected e1135)",
> >             "initiated_at": "2019-07-20 23:13:18.725466",
> >             "age": 329.248875,
> >             "duration": 329.248901,
> >             "type_data": {
> >                 "flag_point": "waiting for sub ops",
> >                 "client_info": {
> >                     "client": "client.17985",
> >                     "client_addr": "10.5.137.174:0/1544466091",
> >                     "tid": 670679
> >                 },
> >                 "events": [
> >                     {
> >                         "time": "2019-07-20 23:13:18.725466",
> >                         "event": "initiated"
> >                     },
> >                     {
> >                         "time": "2019-07-20 23:13:18.726585",
> >                         "event": "queued_for_pg"
> >                     },
> >                     {
> >                         "time": "2019-07-20 23:13:18.726606",
> >                         "event": "reached_pg"
> >                     },
> >                     {
> >                         "time": "2019-07-20 23:13:18.726752",
> >                         "event": "started"
> >                     },
> >                     {
> >                         "time": "2019-07-20 23:13:18.726842",
> >                         "event": "waiting for subops from 4"
> >                     },
>
> This usually indicates there is something going on with osd.4
>
> I would go and see if osd.4 is very busy at that moment and check if the
> disk is 100% busy in iostat.
>
> It could be a number of things, but I would check that first.
>
> Wido
>
> >                     {
> >                         "time": "2019-07-20 23:13:18.743134",
> >                         "event": "op_commit"
> >                     },
> >                     {
> >                         "time": "2019-07-20 23:13:18.743137",
> >                         "event": "op_applied"
> >                     }
> >                 ]
> >             }
> >         },
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to