On Thu, Nov 15, 2018 at 2:30 PM 赵赵贺东 <zhaohed...@gmail.com> wrote: > > I test in 12 osds cluster, change objecter_inflight_op_bytes from 100MB to > 300MB, performance seems not change obviously. > But at the beginning , librbd works in better performance in 12 osds cluster. > So it seems meaning less for me. > >>>> In a small cluster(12 osds), 4m seq write performance for Librbd VS KRBD > >>>> is about 0.89 : 1 (177MB/s : 198MB/s ). > >>>> In a big cluster (72 osds), 4m seq write performance for Librbd VS KRBD > >>>> is about 0.38: 1 (420MB/s : 1080MB/s). > > > Our problem is librbd bad performance in big cluster (72 osds) > But I can not test in 72 osds right now, some other tests are running . > I will test in 72 osds when our cluster is ready. > > It is a little hard to understand that objecter_inflight_op_bytes=100MB works > well in 12 osds cluster, but works poor in 72 osd clusters. > Dose objecter_inflight_op_bytes not have an effect on krbd, only effect > librbd?
Correct -- the "ceph.conf" config settings are for user-space tooling only. Given the fact that you are writing full 4MiB objects in your test, any user-space performance degradation is probably going to be in the librados layer and below. That 100 MiB limit setting will block the IO path while it waits for in-flight IO to complete. You also might be just hitting the default throughput of the lower-level messenger code, so perhaps you need to throw more threads at it (ms_async_op_threads / ms_async_max_op_threads) or change its throttles (ms_dispatch_throttle_bytes). Also, depending on your cluster and krbd versions, perhaps the OSDs are telling your clients to back-off but only librados is responding to it. You should also take into account the validity of your test case -- does it really match your expected workload that you are trying to optimize against? > Thanks. > > > > > 在 2018年11月15日,下午3:50,赵赵贺东 <zhaohed...@gmail.com> 写道: > > > > Thanks you for your suggestion. > > It really give me a lot of inspirations. > > > > > > I will test as your suggestion, and browse through src/common/config_opts.h > > to see if I can find some configs performance related. > > > > But, our osd nodes hardware itself is very poor, that is the truth…we have > > to face it. > > Two osds in an arm board, two gb memory and 2*10T hdd disk on board, so one > > osd has 1gb memory to support 10TB hdd disk, we must try to make cluster > > works better as we can. > > > > > > Thanks. > > > >> 在 2018年11月15日,下午2:08,Jason Dillaman <jdill...@redhat.com> 写道: > >> > >> Attempting to send 256 concurrent 4MiB writes via librbd will pretty > >> quickly hit the default "objecter_inflight_op_bytes = 100 MiB" limit, > >> which will drastically slow (stall) librados. I would recommend > >> re-testing librbd w/ a much higher throttle override. > >> On Thu, Nov 15, 2018 at 11:34 AM 赵赵贺东 <zhaohed...@gmail.com> wrote: > >>> > >>> Thank you for your attention. > >>> > >>> Our test are in run in physical machine environments. > >>> > >>> Fio for KRBD: > >>> [seq-write] > >>> description="seq-write" > >>> direct=1 > >>> ioengine=libaio > >>> filename=/dev/rbd0 > >>> numjobs=1 > >>> iodepth=256 > >>> group_reporting > >>> rw=write > >>> bs=4M > >>> size=10T > >>> runtime=180 > >>> > >>> */dev/rbd0 mapped by rbd_pool/image2, so KRBD & librbd fio test use the > >>> same image. > >>> > >>> Fio for librbd: > >>> [global] > >>> direct=1 > >>> numjobs=1 > >>> ioengine=rbd > >>> clientname=admin > >>> pool=rbd_pool > >>> rbdname=image2 > >>> invalidate=0 # mandatory > >>> rw=write > >>> bs=4M > >>> size=10T > >>> runtime=180 > >>> > >>> [rbd_iodepth32] > >>> iodepth=256 > >>> > >>> > >>> Image info: > >>> rbd image 'image2': > >>> size 50TiB in 13107200 objects > >>> order 22 (4MiB objects) > >>> data_pool: ec_rbd_pool > >>> block_name_prefix: rbd_data.8.148bb6b8b4567 > >>> format: 2 > >>> features: layering, data-pool > >>> flags: > >>> create_timestamp: Wed Nov 14 09:21:18 2018 > >>> > >>> * data_pool is a EC pool > >>> > >>> Pool info: > >>> pool 8 'rbd_pool' replicated size 2 min_size 1 crush_rule 0 object_hash > >>> rjenkins pg_num 256 pgp_num 256 last_change 82627 flags hashpspool > >>> stripe_width 0 application rbd > >>> pool 9 'ec_rbd_pool' erasure size 6 min_size 5 crush_rule 4 object_hash > >>> rjenkins pg_num 256 pgp_num 256 last_change 82649 flags > >>> hashpspool,ec_overwrites stripe_width 16384 application rbd > >>> > >>> > >>> Rbd cache: Off (Because I think in tcmu , rbd cache will mandatory off, > >>> and our cluster will export disk by iscsi in furture.) > >>> > >>> > >>> Thanks! > >>> > >>> > >>> 在 2018年11月15日,下午1:22,Gregory Farnum <gfar...@redhat.com> 写道: > >>> > >>> You'll need to provide more data about how your test is configured and > >>> run for us to have a good idea. IIRC librbd is often faster than krbd > >>> because it can support newer features and things, but krbd may have less > >>> overhead and is not dependent on the VM's driver configuration in QEMU... > >>> > >>> On Thu, Nov 15, 2018 at 8:22 AM 赵赵贺东 <zhaohed...@gmail.com> wrote: > >>>> > >>>> Hi cephers, > >>>> > >>>> > >>>> All our cluster osds are deployed in armhf. > >>>> Could someone say something about what is the rational performance rates > >>>> for librbd VS KRBD ? > >>>> Or rational performance loss range when we use librbd compare to KRBD. > >>>> I googled a lot, but I could not find a solid criterion. > >>>> In fact , it confused me for a long time. > >>>> > >>>> About our tests: > >>>> In a small cluster(12 osds), 4m seq write performance for Librbd VS KRBD > >>>> is about 0.89 : 1 (177MB/s : 198MB/s ). > >>>> In a big cluster (72 osds), 4m seq write performance for Librbd VS KRBD > >>>> is about 0.38: 1 (420MB/s : 1080MB/s). > >>>> > >>>> We expect even increase osd numbers, Librbd performance can keep being > >>>> close to KRBD. > >>>> > >>>> PS: Librbd performance are tested both in fio rbd engine & iscsi > >>>> (tcmu+librbd). > >>>> > >>>> Thanks. > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> ceph-users mailing list > >>>> ceph-users@lists.ceph.com > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > >>> > >>> _______________________________________________ > >>> ceph-users mailing list > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> > >> > >> -- > >> Jason > > > -- Jason _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com