Been trying to do a fairly large rsync onto a 3x replicated, filestore HDD backed CephFS pool.
Luminous 12.2.1 for all daemons, kernel CephFS driver, Ubuntu 16.04 running mix of 4.8 and 4.10 kernels, 2x10GbE networking between all daemons and clients. > $ ceph versions > { > "mon": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 3 > }, > "mgr": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 3 > }, > "osd": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 74 > }, > "mds": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 2 > }, > "overall": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 82 > } > } > > <https://www.anandtech.com/show/12116/amd-and-microsoft-announce-azure-vms-with-32core-epyc-cpus>HEALTH_ERR > 1 MDSs report oversized cache; 1 MDSs have many clients failing to respond > to cache pressure; 1 MDSs behind on tr > imming; noout,nodeep-scrub flag(s) set; application not enabled on 1 pool(s); > 242 slow requests are blocked > 32 sec > ; 769378 stuck requests are blocked > 4096 sec > MDS_CACHE_OVERSIZED 1 MDSs report oversized cache > mdsdb(mds.0): MDS cache is too large (23GB/8GB); 1018 inodes in use by > clients, 1 stray files > MDS_CLIENT_RECALL_MANY 1 MDSs have many clients failing to respond to cache > pressure > mdsdb(mds.0): Many clients (37) failing to respond to cache > pressureclient_count: 37 > MDS_TRIM 1 MDSs behind on trimming > mdsdb(mds.0): Behind on trimming (36252/30)max_segments: 30, > num_segments: 36252 > OSDMAP_FLAGS noout,nodeep-scrub flag(s) set > REQUEST_SLOW 242 slow requests are blocked > 32 sec > 236 ops are blocked > 2097.15 sec > 3 ops are blocked > 1048.58 sec > 2 ops are blocked > 524.288 sec > 1 ops are blocked > 32.768 sec > REQUEST_STUCK 769378 stuck requests are blocked > 4096 sec > 91 ops are blocked > 67108.9 sec > 121258 ops are blocked > 33554.4 sec > 308189 ops are blocked > 16777.2 sec > 251586 ops are blocked > 8388.61 sec > 88254 ops are blocked > 4194.3 sec > osds 0,1,3,6,8,12,15,16,17,21,22,23 have stuck requests > 16777.2 sec > osds 4,7,9,10,11,14,18,20 have stuck requests > 33554.4 sec > osd.13 has stuck requests > 67108.9 sec This is across 8 nodes, holding 3x 8TB HDD’s each, all backed by Intel P3600 NVMe drives for journaling. Removed SSD OSD’s for brevity. > $ ceph osd tree > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -13 87.28799 root ssd > -1 174.51500 root default > -10 174.51500 rack default.rack2 > -55 43.62000 chassis node2425 > -2 21.81000 host node24 > 0 hdd 7.26999 osd.0 up 1.00000 1.00000 > 8 hdd 7.26999 osd.8 up 1.00000 1.00000 > 16 hdd 7.26999 osd.16 up 1.00000 1.00000 > -3 21.81000 host node25 > 1 hdd 7.26999 osd.1 up 1.00000 1.00000 > 9 hdd 7.26999 osd.9 up 1.00000 1.00000 > 17 hdd 7.26999 osd.17 up 1.00000 1.00000 > -56 43.63499 chassis node2627 > -4 21.81999 host node26 > 2 hdd 7.27499 osd.2 up 1.00000 1.00000 > 10 hdd 7.26999 osd.10 up 1.00000 1.00000 > 18 hdd 7.27499 osd.18 up 1.00000 1.00000 > -5 21.81499 host node27 > 3 hdd 7.26999 osd.3 up 1.00000 1.00000 > 11 hdd 7.26999 osd.11 up 1.00000 1.00000 > 19 hdd 7.27499 osd.19 up 1.00000 1.00000 > -57 43.62999 chassis node2829 > -6 21.81499 host node28 > 4 hdd 7.26999 osd.4 up 1.00000 1.00000 > 12 hdd 7.26999 osd.12 up 1.00000 1.00000 > 20 hdd 7.27499 osd.20 up 1.00000 1.00000 > -7 21.81499 host node29 > 5 hdd 7.26999 osd.5 up 1.00000 1.00000 > 13 hdd 7.26999 osd.13 up 1.00000 1.00000 > 21 hdd 7.27499 osd.21 up 1.00000 1.00000 > -58 43.62999 chassis node3031 > -8 21.81499 host node30 > 6 hdd 7.26999 osd.6 up 1.00000 1.00000 > 14 hdd 7.26999 osd.14 up 1.00000 1.00000 > 22 hdd 7.27499 osd.22 up 1.00000 1.00000 > -9 21.81499 host node31 > 7 hdd 7.26999 osd.7 up 1.00000 1.00000 > 15 hdd 7.26999 osd.15 up 1.00000 1.00000 > 23 hdd 7.27499 osd.23 up 1.00000 1.00000 Trying to figure out what in my configuration is off, because I am told that CephFS should be able to throttle the requests to match the underlying storage medium and not create such an extensive log jam. > [mds] > mds_cache_size = 0 > mds_cache_memory_limit = 8589934592 > > [osd] > osd_op_threads = 4 > filestore max sync interval = 30 > osd_max_backfills = 10 > osd_recovery_max_active = 16 > osd_op_thread_suicide_timeout = 600 I originally had the mds_cache_size set to 10000000 from Jewel, but read that it is better to 0 that and set limits in the mds_cache_memory_limit now. So I set that to 8GB to see if that helped any. Because I haven’t seen anything less than I believe 4.13 kernel for the Luminous capabilities CephFS kernel driver, everything is using Jewel capabilities for CephFS. > $ ceph features > { > "mon": { > "group": { > "features": "0x1ffddff8eea4fffb", > "release": "luminous", > "num": 3 > } > }, > "mds": { > "group": { > "features": "0x1ffddff8eea4fffb", > "release": "luminous", > "num": 2 > } > }, > "osd": { > "group": { > "features": "0x1ffddff8eea4fffb", > "release": "luminous", > "num": 74 > } > }, > "client": { > "group": { > "features": "0x107b84a842aca", > "release": "hammer", > "num": 2 > }, > "group": { > "features": "0x40107b86a842ada", > "release": "jewel", > "num": 39 > }, > "group": { > "features": "0x7010fb86aa42ada", > "release": "jewel", > "num": 1 > }, > "group": { > "features": "0x1ffddff8eea4fffb", > "release": "luminous", > "num": 189 > } > } > } Any help is appreciated. Thanks, Reed
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com