Hi Aleksey,

Thanks for the detailed breakdown!

We're currently using replication pools but will be testing ec pools soon
enough and this is a useful set of parameters to look at. Also, I had not
considered the bluestore parameters, thanks for pointing that out.

Kind regards

On Wed, Jul 31, 2019 at 2:36 PM Aleksey Gutikov <aleksey.guti...@synesis.ru>
wrote:

> Hi Thomas,
>
> We did some investigations some time before and got several rules how to
> configure rgw and osd for big files stored on erasure-coded pool.
> Hope it will be useful.
> And if I have any mistakes, please let me know.
>
> S3 object saving pipeline:
>
> - S3 object is divided into multipart shards by client.
> - Rgw shards each multipart shard into rados objects of size
> rgw_obj_stripe_size.
> - Primary osd stripes rados object into ec stripes of width ==
> ec.k*profile.stripe_unit, ec code them and send units into secondary
> osds and write into object store (bluestore).
> - Each subobject of rados object has size == (rados object size)/k.
> - Then while writing into disk bluestore can divide rados subobject into
> extents of minimal size == bluestore_min_alloc_size_hdd.
>
> Next rules can save some space and iops:
>
> - rgw_multipart_min_part_size SHOULD be multiple of rgw_obj_stripe_size
> (client can use different value greater than)
> - MUST rgw_obj_stripe_size == rgw_max_chunk_size
> - ec stripe == osd_pool_erasure_code_stripe_unit or profile.stripe_unit
> - rgw_obj_stripe_size SHOULD be multiple of profile.stripe_unit*ec.k
> - bluestore_min_alloc_size_hdd MAY be equal to bluefs_alloc_size (to
> avoid fragmentation)
> - rgw_obj_stripe_size/ec.k SHOULD be multiple of
> bluestore_min_alloc_size_hdd
> - bluestore_min_alloc_size_hdd MAY be multiple of profile.stripe_unit
>
> For example, if ec.k=5:
>
> - rgw_multipart_min_part_size = rgw_obj_stripe_size = rgw_max_chunk_size
> = 20M
> - rados object size == 20M
> - profile.stripe_unit = 256k
> - rados subobject size == 4M, 16 ec stripe units (20M / 5)
> - bluestore_min_alloc_size_hdd = bluefs_alloc_size = 1M
> - rados subobject can be written in 4 extents each containing 4 ec
> stripe units
>
>
>
> On 30.07.19 17:35, Thomas Bennett wrote:
> > Hi,
> >
> > Does anyone out there use bigger than default values for
> > rgw_max_chunk_size and rgw_obj_stripe_size?
> >
> > I'm planning to set rgw_max_chunk_size and rgw_obj_stripe_size  to
> > 20MiB, as it suits our use case and from our testing we can't see any
> > obvious reason not to.
> >
> > Is there some convincing experience that we should stick with 4MiBs?
> >
> > Regards,
> > Tom
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
>
> Best regards!
> Aleksei Gutikov | Ceph storage engeneer
> synesis.ru | Minsk. BY
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Thomas Bennett

Storage Engineer at SARAO
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to