Our workload involves creating and destroying a lot of pools. Each pool
has 100 pgs, so it adds up. Could this be causing the problem? What
would you suggest instead?
...this is most likely the cause. Deleting a pool causes the data and
pgs associated with it to be deleted asynchronously, which can be a lot
of background work for the osds.
If you're using the cfq scheduler you can try decreasing the priority
of these operations with the "osd disk thread ioprio..." options:
http://ceph.com/docs/master/rados/configuration/osd-config-ref/#operations
If that doesn't help enough, deleting data from pools before deleting
the pools might help, since you can control the rate more finely. And of
course not creating/deleting so many pools would eliminate the hidden
background cost of deleting the pools.
Thanks for your answer. Some follow-up questions:
- I wouldn't expect that pool deletion is the problem, since our pools,
although many, don't contain much data. Typically, we will have one rbd
per pool, several GB in size, but in practice containing little data.
Would you expect that performance penalty from deleting pool to be
relative to the requested size of the rbd, or relative to the quantity
of data actually stored in it?
- Rather than creating and deleting multiple pools, each containing a
single rbd, do you think we would see a speed-up if we were to instead
have one pool, containing multiple (frequently created and deleted)
rbds? Does the performance penalty stem only from deleting pools
themselves, or from deleting objects within the pool as well?
- Somewhat off-topic, but for my own curiosity: Why is deleting data so
slow, in terms of ceph's architecture? Shouldn't it just be a matter of
flagging a region as available and allowing it to be overwritten, as
would a traditional file system?
Jeff
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com