Re: [ceph-users] RGW 4 MiB objects

Aleksey Gutikov Wed, 31 Jul 2019 05:36:19 -0700

Hi Thomas,

We did some investigations some time before and got several rules how toconfigure rgw and osd for big files stored on erasure-coded pool.

Hope it will be useful.
And if I have any mistakes, please let me know.


S3 object saving pipeline:

- S3 object is divided into multipart shards by client.

- Rgw shards each multipart shard into rados objects of sizergw_obj_stripe_size.- Primary osd stripes rados object into ec stripes of width ==ec.k*profile.stripe_unit, ec code them and send units into secondaryosds and write into object store (bluestore).

- Each subobject of rados object has size == (rados object size)/k.

- Then while writing into disk bluestore can divide rados subobject intoextents of minimal size == bluestore_min_alloc_size_hdd.


Next rules can save some space and iops:

- rgw_multipart_min_part_size SHOULD be multiple of rgw_obj_stripe_size(client can use different value greater than)

- MUST rgw_obj_stripe_size == rgw_max_chunk_size
- ec stripe == osd_pool_erasure_code_stripe_unit or profile.stripe_unit
- rgw_obj_stripe_size SHOULD be multiple of profile.stripe_unit*ec.k

- bluestore_min_alloc_size_hdd MAY be equal to bluefs_alloc_size (toavoid fragmentation)- rgw_obj_stripe_size/ec.k SHOULD be multiple ofbluestore_min_alloc_size_hdd

- bluestore_min_alloc_size_hdd MAY be multiple of profile.stripe_unit

For example, if ec.k=5:

- rgw_multipart_min_part_size = rgw_obj_stripe_size = rgw_max_chunk_size= 20M

- rados object size == 20M
- profile.stripe_unit = 256k
- rados subobject size == 4M, 16 ec stripe units (20M / 5)
- bluestore_min_alloc_size_hdd = bluefs_alloc_size = 1M

- rados subobject can be written in 4 extents each containing 4 ecstripe units




On 30.07.19 17:35, Thomas Bennett wrote:

Hi,
Does anyone out there use bigger than default values forrgw_max_chunk_size and rgw_obj_stripe_size?
I'm planning to set rgw_max_chunk_size and rgw_obj_stripe_size to20MiB, as it suits our use case and from our testing we can't see anyobvious reason not to.
Is there some convincing experience that we should stick with 4MiBs?

Regards,
Tom

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--

Best regards!
Aleksei Gutikov | Ceph storage engeneer
synesis.ru | Minsk. BY
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW 4 MiB objects

Reply via email to