Hello, Ceph users,
TL;DR: how to integrate librados async operations with an event-loop library?
Details:
I need to store lots of objects, but I don't need a full-featured S3 API,
and my objects are mostly small (majority is under 10 KB, see below),
so a per-object overhead is significant for me.
If I understand it correctly (do I?) radosgw stores each S3 object in its
own RADOS object, so it is not suitable for my purpose -- packing
multiple source objects to a larger RADOS object is a necessity.
I am considering writing my own frontend with RADOS pool[s] as a storage
backend, using librados (or maybe libradosstriper for large objects),
EC pool for data, and a replicated pool for OMAP object metadata.
Librados supports async I/O with callbacks, but I am not sure how does it
fit into a wider environment of other asynchronous operations in the
same process:
- is librados thread-safe, or should I use, for example, a per-thread
rados_ioctx_t or even rados_t instance? Should I do some kind of locking
when calling librados functions from a multi-threaded program?
- is it possible to incorporate librados to an event-driven program,
using I/O event loop library (libevent, glib, ...)? If so, how can I
make librados to wait for socket I/O using the global event loop?
- is it still recommended to have at most 100K OMAP keys per RADOS object?
I have created an empty object and set about 120K OMAP keys on it
using rados(1) command, and so far the cluster is HEALTH_OK.
- what is the preferred RADOS object size (for packing many small
source objects into a single RADOS object)?
===============================
FWIW, one of the instances of my data has the following size
and distribution of object sizes:
Total: 210M objs, 114 TB total size
0B-1KB: 21M objs, 18 GB total size
1KB-10KB: 137M objs, 344 GB total size
10KB-100KB: 31M objs, 1 TB total size
100KB-1MB: 15M objs, 5 TB total size
1MB-10MB: 4M objs, 15 TB total size
10MB-100MB: 966K objs, 25 TB total size
100MB-1GB: 137K objs, 38 TB total size
1GB-10GB: 13K objs, 26 TB total size
10GB-100GB: 162 objs, 3 TB total size
I have a not-so-big Ceph cluster (3 mons, 15-ish OSD nodes, each with two
HDD-based OSDs with metadata/OMAP on NVMe).
Thanks,
-Yenya
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| https://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
We all agree on the necessity of compromise. We just can't agree on
when it's necessary to compromise. --Larry Wall
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]