Hi Aleksey, Thanks for the detailed breakdown!
We're currently using replication pools but will be testing ec pools soon enough and this is a useful set of parameters to look at. Also, I had not considered the bluestore parameters, thanks for pointing that out. Kind regards On Wed, Jul 31, 2019 at 2:36 PM Aleksey Gutikov <aleksey.guti...@synesis.ru> wrote: > Hi Thomas, > > We did some investigations some time before and got several rules how to > configure rgw and osd for big files stored on erasure-coded pool. > Hope it will be useful. > And if I have any mistakes, please let me know. > > S3 object saving pipeline: > > - S3 object is divided into multipart shards by client. > - Rgw shards each multipart shard into rados objects of size > rgw_obj_stripe_size. > - Primary osd stripes rados object into ec stripes of width == > ec.k*profile.stripe_unit, ec code them and send units into secondary > osds and write into object store (bluestore). > - Each subobject of rados object has size == (rados object size)/k. > - Then while writing into disk bluestore can divide rados subobject into > extents of minimal size == bluestore_min_alloc_size_hdd. > > Next rules can save some space and iops: > > - rgw_multipart_min_part_size SHOULD be multiple of rgw_obj_stripe_size > (client can use different value greater than) > - MUST rgw_obj_stripe_size == rgw_max_chunk_size > - ec stripe == osd_pool_erasure_code_stripe_unit or profile.stripe_unit > - rgw_obj_stripe_size SHOULD be multiple of profile.stripe_unit*ec.k > - bluestore_min_alloc_size_hdd MAY be equal to bluefs_alloc_size (to > avoid fragmentation) > - rgw_obj_stripe_size/ec.k SHOULD be multiple of > bluestore_min_alloc_size_hdd > - bluestore_min_alloc_size_hdd MAY be multiple of profile.stripe_unit > > For example, if ec.k=5: > > - rgw_multipart_min_part_size = rgw_obj_stripe_size = rgw_max_chunk_size > = 20M > - rados object size == 20M > - profile.stripe_unit = 256k > - rados subobject size == 4M, 16 ec stripe units (20M / 5) > - bluestore_min_alloc_size_hdd = bluefs_alloc_size = 1M > - rados subobject can be written in 4 extents each containing 4 ec > stripe units > > > > On 30.07.19 17:35, Thomas Bennett wrote: > > Hi, > > > > Does anyone out there use bigger than default values for > > rgw_max_chunk_size and rgw_obj_stripe_size? > > > > I'm planning to set rgw_max_chunk_size and rgw_obj_stripe_size to > > 20MiB, as it suits our use case and from our testing we can't see any > > obvious reason not to. > > > > Is there some convincing experience that we should stick with 4MiBs? > > > > Regards, > > Tom > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > > Best regards! > Aleksei Gutikov | Ceph storage engeneer > synesis.ru | Minsk. BY > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Thomas Bennett Storage Engineer at SARAO
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com