Hi, I have a big-ish cluster that, amongst other things, has a radosgw configured to have an EC data pool (k=12, m=4). The cluster is currently running Jewel (10.2.7).
That pool spans 244 HDDs and has 2048 PGs. from the df detail: .rgw.buckets.ec 26 - N/A N/A 76360G 28.66 185T 97908947 95614k 73271k 185M 101813G ct-radosgw 37 - N/A N/A 4708G 70.69 1952G 5226185 2071k 591M 1518M 9416G The ct-radosgw should be size 3, but currently due to an unrelated issue (pdu failure) is size 2. Whenever I flush data from the cache tier to the base tier the OSDs start updating their local leveldb database, using up 100% IO, until they: a) are set as down for no answer, and/or b) suicide timeout. I have other pools targeting those same OSDs but until now nothing has happened when the IO goes to the other pools. Any ideas on where to proceed? thanks, _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com