Networking is 10Gig. I notice recovery IO is wildly variable, I assume that's normal.
Very little load as this is yet to go into production, I was "seeing what it would handle" at the time it broke. I checked this morning and the slow request had gone and I could access the blocked file again. All OSes are Ubuntu 16.04.01 with the stock 4.4.0-72-generic kernel, and there were two CephFS clients accessing it, also 16.04.1. Ceph on all is 11.2.0, installed from the debian-kraken repos at download.ceph.com. All OSDs are bluestore. As of now all is okay, so don't want to waste anyone's time on a wild goose chase. On Wed, May 24, 2017 at 6:15 AM, John Spray <jsp...@redhat.com> wrote: > On Tue, May 23, 2017 at 11:41 PM, Daniel K <satha...@gmail.com> wrote: > > Have a 20 OSD cluster -"my first ceph cluster" that has another 400 OSDs > > enroute. > > > > I was "beating up" on the cluster, and had been writing to a 6TB file in > > CephFS for several hours, during which I changed the crushmap to better > > match my environment, generating a bunch of recovery IO. After about > 5.8TB > > written, one of the OSD(which is also a MON..soon to be rectivied) hosts > > crashed that hat 5 OSDs on it, and after rebooting, I have this in ceph > -s: > > (The degraded/misplaced warnings are likely because the cluster hasn't > > completed rebalancing after I changed the crushmap I assume) > > > > Losing a quarter of your OSDs down while simultaneously rebalancing > after editing your CRUSH map is a brutal thing to a Ceph cluster, and > I would expect it to impact your client IO severely. > > I see that you've got 112MB/s of recovery going on, which may or may > not be saturating some links depending on whether you're using 1gig or > 10gig networking. > > > 2017-05-23 18:33:13.775924 7ff9d3230700 -1 WARNING: the following > dangerous > > and experimental features are enabled: bluestore > > 2017-05-23 18:33:13.781732 7ff9d3230700 -1 WARNING: the following > dangerous > > and experimental features are enabled: bluestore > > cluster e92e20ca-0fe6-4012-86cc-aa51e0466661 > > health HEALTH_WARN > > 440 pgs backfill_wait > > 7 pgs backfilling > > 85 pgs degraded > > 5 pgs recovery_wait > > 85 pgs stuck degraded > > 452 pgs stuck unclean > > 77 pgs stuck undersized > > 77 pgs undersized > > recovery 196526/3554278 objects degraded (5.529%) > > recovery 1690392/3554278 objects misplaced (47.559%) > > mds0: 1 slow requests are blocked > 30 sec > > monmap e4: 3 mons at > > {stor-vm1=10.0.15.51:6789/0,stor-vm2=10.0.15.52:6789/0,stor- > vm3=10.0.15.53:6789/0} > > election epoch 136, quorum 0,1,2 stor-vm1,stor-vm2,stor-vm3 > > fsmap e21: 1/1/1 up {0=stor-vm4=up:active} > > mgr active: stor-vm1 standbys: stor-vm2 > > osdmap e4655: 20 osds: 20 up, 20 in; 450 remapped pgs > > flags sortbitwise,require_jewel_osds,require_kraken_osds > > pgmap v192589: 1428 pgs, 5 pools, 5379 GB data, 1345 kobjects > > 11041 GB used, 16901 GB / 27943 GB avail > > 196526/3554278 objects degraded (5.529%) > > 1690392/3554278 objects misplaced (47.559%) > > 975 active+clean > > 364 active+remapped+backfill_wait > > 76 active+undersized+degraded+remapped+backfill_wait > > 3 active+recovery_wait+degraded+remapped > > 3 active+remapped+backfilling > > 3 active+degraded+remapped+backfilling > > 2 active+recovery_wait+degraded > > 1 active+clean+scrubbing+deep > > 1 active+undersized+degraded+remapped+backfilling > > recovery io 112 MB/s, 28 objects/s > > > > > > Seems related to the "corrupted rbd filesystems since jewel" thread. > > > > > > log entries on the MDS server: > > > > 2017-05-23 18:27:12.966218 7f95ed6c0700 0 log_channel(cluster) log > [WRN] : > > slow request 243.113407 seconds old, received at 2017-05-23 > 18:23:09.852729: > > client_request(client.204100:5 getattr pAsLsXsFs #100000003ec 2017-05-23 > > 17:48:23.770852 RETRY=2 caller_uid=0, caller_gid=0{}) currently failed to > > rdlock, waiting > > > > > > output of ceph daemon mds.stor-vm4 objecter_requests(changes each time I > run > > it) > > If that changes each time you run it then it means the OSD requests > from the MDS are happening. > > However, it's possible that you have multiple clients and one of them > is stuck trying to write something back (to a PG that is not accepting > the write (yet?)), and thereby preventing the MDS from granting a lock > for another client. > > What clients (+versions) are involved, what's the workload, what > versions of Ceph? > > John > > > : > > root@stor-vm4:/var/log/ceph# ceph daemon mds.stor-vm4 objecter_requests > > { > > "ops": [ > > { > > "tid": 66700, > > "pg": "1.60e95c32", > > "osd": 4, > > "object_id": "100000003ec.003efb9f", > > "object_locator": "@1", > > "target_object_id": "100000003ec.003efb9f", > > "target_object_locator": "@1", > > "paused": 0, > > "used_replica": 0, > > "precalc_pgid": 0, > > "last_sent": "1.47461e+06s", > > "attempts": 1, > > "snapid": "head", > > "snap_context": "0=[]", > > "mtime": "1969-12-31 19:00:00.000000s", > > "osd_ops": [ > > "stat" > > ] > > } > > ], > > "linger_ops": [], > > "pool_ops": [], > > "pool_stat_ops": [], > > "statfs_ops": [], > > "command_ops": [] > > } > > > > > > I've tried restarting the mds daemon ( systemctl stop ceph-mds\*.service > > ceph-mds.target && systemctl start ceph-mds\*.service ceph-mds.target ) > > > > > > > > IO to the file that was being access when the host crashed is blocked. > > > > > > Suggestions? > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com