Networking is 10Gig. I notice recovery IO is wildly variable, I assume
that's normal.

Very little load as this is yet to go into production, I was "seeing what
it would handle" at the time it broke.

I checked this morning and the slow request had gone and I could access the
blocked file again.

All OSes are Ubuntu 16.04.01 with the stock 4.4.0-72-generic kernel, and
there were two CephFS clients accessing it, also 16.04.1.

Ceph on all is 11.2.0, installed from the debian-kraken repos at
download.ceph.com. All OSDs are bluestore.


As of now all is okay, so don't want to waste anyone's time on a wild goose
chase.




On Wed, May 24, 2017 at 6:15 AM, John Spray <jsp...@redhat.com> wrote:

> On Tue, May 23, 2017 at 11:41 PM, Daniel K <satha...@gmail.com> wrote:
> > Have a 20 OSD cluster -"my first ceph cluster" that has another 400 OSDs
> > enroute.
> >
> > I was "beating up" on the cluster, and had been writing to a 6TB file in
> > CephFS for several hours, during which I changed the crushmap to better
> > match my environment, generating a bunch of recovery IO. After about
> 5.8TB
> > written, one of the OSD(which is also a MON..soon to be rectivied) hosts
> > crashed that hat 5 OSDs on it, and after rebooting, I have this in ceph
> -s:
> > (The degraded/misplaced warnings are likely because the cluster hasn't
> > completed rebalancing after I changed the crushmap I assume)
> >
>
> Losing a quarter of your OSDs down while simultaneously rebalancing
> after editing your CRUSH map is a brutal thing to a Ceph cluster, and
> I would expect it to impact your client IO severely.
>
> I see that you've got 112MB/s of recovery going on, which may or may
> not be saturating some links depending on whether you're using 1gig or
> 10gig networking.
>
> > 2017-05-23 18:33:13.775924 7ff9d3230700 -1 WARNING: the following
> dangerous
> > and experimental features are enabled: bluestore
> > 2017-05-23 18:33:13.781732 7ff9d3230700 -1 WARNING: the following
> dangerous
> > and experimental features are enabled: bluestore
> >     cluster e92e20ca-0fe6-4012-86cc-aa51e0466661
> >      health HEALTH_WARN
> >             440 pgs backfill_wait
> >             7 pgs backfilling
> >             85 pgs degraded
> >             5 pgs recovery_wait
> >             85 pgs stuck degraded
> >             452 pgs stuck unclean
> >             77 pgs stuck undersized
> >             77 pgs undersized
> >             recovery 196526/3554278 objects degraded (5.529%)
> >             recovery 1690392/3554278 objects misplaced (47.559%)
> >             mds0: 1 slow requests are blocked > 30 sec
> >      monmap e4: 3 mons at
> > {stor-vm1=10.0.15.51:6789/0,stor-vm2=10.0.15.52:6789/0,stor-
> vm3=10.0.15.53:6789/0}
> >             election epoch 136, quorum 0,1,2 stor-vm1,stor-vm2,stor-vm3
> >       fsmap e21: 1/1/1 up {0=stor-vm4=up:active}
> >         mgr active: stor-vm1 standbys: stor-vm2
> >      osdmap e4655: 20 osds: 20 up, 20 in; 450 remapped pgs
> >             flags sortbitwise,require_jewel_osds,require_kraken_osds
> >       pgmap v192589: 1428 pgs, 5 pools, 5379 GB data, 1345 kobjects
> >             11041 GB used, 16901 GB / 27943 GB avail
> >             196526/3554278 objects degraded (5.529%)
> >             1690392/3554278 objects misplaced (47.559%)
> >                  975 active+clean
> >                  364 active+remapped+backfill_wait
> >                   76 active+undersized+degraded+remapped+backfill_wait
> >                    3 active+recovery_wait+degraded+remapped
> >                    3 active+remapped+backfilling
> >                    3 active+degraded+remapped+backfilling
> >                    2 active+recovery_wait+degraded
> >                    1 active+clean+scrubbing+deep
> >                    1 active+undersized+degraded+remapped+backfilling
> > recovery io 112 MB/s, 28 objects/s
> >
> >
> > Seems related to the "corrupted rbd filesystems since jewel" thread.
> >
> >
> > log entries on the MDS server:
> >
> > 2017-05-23 18:27:12.966218 7f95ed6c0700  0 log_channel(cluster) log
> [WRN] :
> > slow request 243.113407 seconds old, received at 2017-05-23
> 18:23:09.852729:
> > client_request(client.204100:5 getattr pAsLsXsFs #100000003ec 2017-05-23
> > 17:48:23.770852 RETRY=2 caller_uid=0, caller_gid=0{}) currently failed to
> > rdlock, waiting
> >
> >
> > output of ceph daemon mds.stor-vm4 objecter_requests(changes each time I
> run
> > it)
>
> If that changes each time you run it then it means the OSD requests
> from the MDS are happening.
>
> However, it's possible that you have multiple clients and one of them
> is stuck trying to write something back (to a PG that is not accepting
> the write (yet?)), and thereby preventing the MDS from granting a lock
> for another client.
>
> What clients (+versions) are involved, what's the workload, what
> versions of Ceph?
>
> John
>
> > :
> > root@stor-vm4:/var/log/ceph# ceph daemon mds.stor-vm4 objecter_requests
> > {
> >     "ops": [
> >         {
> >             "tid": 66700,
> >             "pg": "1.60e95c32",
> >             "osd": 4,
> >             "object_id": "100000003ec.003efb9f",
> >             "object_locator": "@1",
> >             "target_object_id": "100000003ec.003efb9f",
> >             "target_object_locator": "@1",
> >             "paused": 0,
> >             "used_replica": 0,
> >             "precalc_pgid": 0,
> >             "last_sent": "1.47461e+06s",
> >             "attempts": 1,
> >             "snapid": "head",
> >             "snap_context": "0=[]",
> >             "mtime": "1969-12-31 19:00:00.000000s",
> >             "osd_ops": [
> >                 "stat"
> >             ]
> >         }
> >     ],
> >     "linger_ops": [],
> >     "pool_ops": [],
> >     "pool_stat_ops": [],
> >     "statfs_ops": [],
> >     "command_ops": []
> > }
> >
> >
> > I've tried restarting the mds daemon ( systemctl stop ceph-mds\*.service
> > ceph-mds.target &&  systemctl start ceph-mds\*.service ceph-mds.target )
> >
> >
> >
> > IO to the file that was being access when the host crashed is blocked.
> >
> >
> > Suggestions?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to