On Thu, Nov 15, 2018 at 2:30 PM 赵赵贺东 <zhaohed...@gmail.com> wrote:
>
> I test in 12 osds cluster, change objecter_inflight_op_bytes from 100MB to 
> 300MB, performance seems not change obviously.
> But at the beginning , librbd works in better performance in 12 osds cluster. 
> So it seems meaning less for me.
> >>>> In a small cluster(12 osds), 4m seq write performance for Librbd VS KRBD 
> >>>> is about 0.89 : 1 (177MB/s : 198MB/s ).
> >>>> In a big cluster (72 osds), 4m seq write performance for Librbd VS KRBD 
> >>>> is about  0.38: 1 (420MB/s : 1080MB/s).
>
>
> Our problem is librbd bad performance in big cluster (72 osds)
> But I can not test in 72 osds right now, some other tests are running .
> I will test in 72 osds when our cluster is ready.
>
> It is a little hard to understand that objecter_inflight_op_bytes=100MB works 
> well in 12 osds cluster, but works poor in 72 osd clusters.
> Dose objecter_inflight_op_bytes not have an effect  on krbd, only effect 
> librbd?

Correct -- the "ceph.conf" config settings are for user-space tooling
only. Given the fact that you are writing full 4MiB objects in your
test, any user-space performance degradation is probably going to be
in the librados layer and below. That 100 MiB limit setting will block
the IO path while it waits for in-flight IO to complete. You also
might be just hitting the default throughput of the lower-level
messenger code, so perhaps you need to throw more threads at it
(ms_async_op_threads / ms_async_max_op_threads) or change its
throttles (ms_dispatch_throttle_bytes). Also, depending on your
cluster and krbd versions, perhaps the OSDs are telling your clients
to back-off but only librados is responding to it. You should also
take into account the validity of your test case -- does it really
match your expected workload that you are trying to optimize against?

> Thanks.
>
>
>
> > 在 2018年11月15日,下午3:50,赵赵贺东 <zhaohed...@gmail.com> 写道:
> >
> > Thanks you for your suggestion.
> > It really give me a lot of inspirations.
> >
> >
> > I will test as your suggestion, and browse through src/common/config_opts.h 
> > to see if I can find some configs performance related.
> >
> > But, our osd nodes hardware itself is very poor, that is the truth…we have 
> > to face it.
> > Two osds in an arm board, two gb memory and 2*10T hdd disk on board, so one 
> > osd has 1gb memory to support 10TB hdd disk, we must try to make cluster 
> > works better as we can.
> >
> >
> > Thanks.
> >
> >> 在 2018年11月15日,下午2:08,Jason Dillaman <jdill...@redhat.com> 写道:
> >>
> >> Attempting to send 256 concurrent 4MiB writes via librbd will pretty
> >> quickly hit the default "objecter_inflight_op_bytes = 100 MiB" limit,
> >> which will drastically slow (stall) librados. I would recommend
> >> re-testing librbd w/ a much higher throttle override.
> >> On Thu, Nov 15, 2018 at 11:34 AM 赵赵贺东 <zhaohed...@gmail.com> wrote:
> >>>
> >>> Thank you for your attention.
> >>>
> >>> Our test are in run in physical machine environments.
> >>>
> >>> Fio for KRBD:
> >>> [seq-write]
> >>> description="seq-write"
> >>> direct=1
> >>> ioengine=libaio
> >>> filename=/dev/rbd0
> >>> numjobs=1
> >>> iodepth=256
> >>> group_reporting
> >>> rw=write
> >>> bs=4M
> >>> size=10T
> >>> runtime=180
> >>>
> >>> */dev/rbd0 mapped by rbd_pool/image2, so KRBD & librbd fio test use the 
> >>> same image.
> >>>
> >>> Fio for librbd:
> >>> [global]
> >>> direct=1
> >>> numjobs=1
> >>> ioengine=rbd
> >>> clientname=admin
> >>> pool=rbd_pool
> >>> rbdname=image2
> >>> invalidate=0    # mandatory
> >>> rw=write
> >>> bs=4M
> >>> size=10T
> >>> runtime=180
> >>>
> >>> [rbd_iodepth32]
> >>> iodepth=256
> >>>
> >>>
> >>> Image info:
> >>> rbd image 'image2':
> >>> size 50TiB in 13107200 objects
> >>> order 22 (4MiB objects)
> >>> data_pool: ec_rbd_pool
> >>> block_name_prefix: rbd_data.8.148bb6b8b4567
> >>> format: 2
> >>> features: layering, data-pool
> >>> flags:
> >>> create_timestamp: Wed Nov 14 09:21:18 2018
> >>>
> >>> * data_pool is a EC pool
> >>>
> >>> Pool info:
> >>> pool 8 'rbd_pool' replicated size 2 min_size 1 crush_rule 0 object_hash 
> >>> rjenkins pg_num 256 pgp_num 256 last_change 82627 flags hashpspool 
> >>> stripe_width 0 application rbd
> >>> pool 9 'ec_rbd_pool' erasure size 6 min_size 5 crush_rule 4 object_hash 
> >>> rjenkins pg_num 256 pgp_num 256 last_change 82649 flags 
> >>> hashpspool,ec_overwrites stripe_width 16384 application rbd
> >>>
> >>>
> >>> Rbd cache: Off (Because I think in tcmu , rbd cache will mandatory off, 
> >>> and our cluster will export disk by iscsi in furture.)
> >>>
> >>>
> >>> Thanks!
> >>>
> >>>
> >>> 在 2018年11月15日,下午1:22,Gregory Farnum <gfar...@redhat.com> 写道:
> >>>
> >>> You'll need to provide more data about how your test is configured and 
> >>> run for us to have a good idea. IIRC librbd is often faster than krbd 
> >>> because it can support newer features and things, but krbd may have less 
> >>> overhead and is not dependent on the VM's driver configuration in QEMU...
> >>>
> >>> On Thu, Nov 15, 2018 at 8:22 AM 赵赵贺东 <zhaohed...@gmail.com> wrote:
> >>>>
> >>>> Hi cephers,
> >>>>
> >>>>
> >>>> All our cluster osds are deployed in armhf.
> >>>> Could someone say something about what is the rational performance rates 
> >>>> for librbd VS KRBD ?
> >>>> Or rational performance loss range when we use librbd compare to KRBD.
> >>>> I googled a lot, but I could not find a solid criterion.
> >>>> In fact , it confused me for a long time.
> >>>>
> >>>> About our tests:
> >>>> In a small cluster(12 osds), 4m seq write performance for Librbd VS KRBD 
> >>>> is about 0.89 : 1 (177MB/s : 198MB/s ).
> >>>> In a big cluster (72 osds), 4m seq write performance for Librbd VS KRBD 
> >>>> is about  0.38: 1 (420MB/s : 1080MB/s).
> >>>>
> >>>> We expect even increase  osd numbers, Librbd performance can keep being 
> >>>> close to KRBD.
> >>>>
> >>>> PS:     Librbd performance are tested both in  fio rbd engine & iscsi 
> >>>> (tcmu+librbd).
> >>>>
> >>>> Thanks.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> ceph-users@lists.ceph.com
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >>
> >> --
> >> Jason
> >
>


-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to