Thanks both of your reply.

Hi haomai.

If we compare the RDMA with Tcp/Ip stack, as I know, we can use the RDMA to
offload the traffic and reduce the CPU usage, which means the other
components can user more CPU to increase some performance metrics, such as
IOPS ?


Hi Deepak,

I would describe more detail of my environment and hope you can give me
more advice about it.

[Ceph Cluster]

   - 1 pool
   - 1 rbd

[Host Daemon]

   - 1 ceph-mon
   - 8 ceph-hosts
   - 1 fio server (compile with librbd and librbd is compile to support the
   RDMA)

[fio config]

 [global]

 ioengine=rbd

 clientname=admin

 pool=rbd

 rbdname=rbd

 clustername=ceph

 runtime=120

 iodepth=128

 numjobs=6

 group_reporting

 size=256G

 direct=1

 ramp_time=5

 [r75w25]

 bs=4k

 rw=randrw

 rwmixread=75



In my RDMA experiment, I start the fio clinet on host 1 and it will trigger
3 fio servers (on each hosts) to start the rand_read_write for specific rbd.
Although I don't specific the public/cluster network address in the
ceph.conf, I guess all traffic between cluster will over 10G networks since
I only input 10G's IP addresses in my manually deploy.
Since the ceph.conf indicate to use the RDMA as the ms_type, I think the
connection between fio and rbd is based on RDMA,

During the fio processing, I observe following system metrics.

1. System CPU usage
2. NIC (1G) throughput
3. NIC (10G) throughtput
4. SSD IO stat.


Only the CPU usage is full (100%) and used by fio server and ceph-osds, the
other 3 metrics still have a room to use, so I think the bottleneck of my
environment is CPU usage.
So, according to those observation and concept of RDMA, I assume that the
RDMA can offload the network traffic to reduce the CPU and give other co

I think if we can use the RDMA for (cluster/private network), it can
offload the network traffic within cluster to reduce the CPU usage to
release more CPU for other components.

If I have any misunderstanding , please correct me,

Thanks your help!







Best Regards,

Hung-Wei Chiu(邱宏瑋)
--
Computer Center, Department of Computer Science
National Chiao Tung University

2017-03-24 2:22 GMT+08:00 Deepak Naidu <dna...@nvidia.com>:

> RDMA is of interest to me. So my below comment.
>
>
>
> >> What surprised me is that the result of RDMA mode is almost the same
> as the basic mode, the iops, latency, throughput, etc.
>
>
>
> Pardon  my knowledge here. If I read your ceph.conf and your notes. It
> seems that you are using RDMA only for “cluster/private network” ? so how
> do you expect RDMA to be efficient on client IOPS/latency/throughput.
>
>
>
>
>
> --
>
> Deepak
>
>
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Haomai Wang
> *Sent:* Thursday, March 23, 2017 4:34 AM
> *To:* Hung-Wei Chiu (邱宏瑋)
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] The performance of ceph with RDMA
>
>
>
>
>
>
>
> On Thu, Mar 23, 2017 at 5:49 AM, Hung-Wei Chiu (邱宏瑋) <
> hwc...@cs.nctu.edu.tw> wrote:
>
> Hi,
>
>
>
> I use the latest (master branch, upgrade at 2017/03/22) to build ceph with
> RDMA and use the fio to test its iops/latency/throughput.
>
>
>
> In my environment, I setup 3 hosts and list the detail of each host below.
>
>
>
> OS: ubuntu 16.04
>
> Storage: SSD * 4 (256G * 4)
>
> Memory: 64GB.
>
> NICs: two NICs, one (intel 1G) for public network and the other (mellanox
> 10G) for private network.
>
>
>
> There're 3 monitor and 24 osds equally distributed within 3 hosts which
> means each hosts contains 1 mon and 8 osds.
>
>
>
> For my experiment, I use two configs, basic and RDMA.
>
>
>
> Basic
>
> [global]
>
> fsid = 0612cc7e-6239-456c-978b-b4df781fe831
>
> mon initial members = ceph-1,ceph-2,ceph-3
>
> mon host = 10.0.0.15,10.0.0.16,10.0.0.17
>
> osd pool default size = 2
>
> osd pool default pg num = 1024
>
> osd pool default pgp num = 1024
>
>
>
>
>
> RDMA
>
> [global]
>
> fsid = 0612cc7e-6239-456c-978b-b4df781fe831
>
> mon initial members = ceph-1,ceph-2,ceph-3
>
> mon host = 10.0.0.15,10.0.0.16,10.0.0.17
>
> osd pool default size = 2
>
> osd pool default pg num = 1024
>
> osd pool default pgp num = 1024
>
> ms_type=async+rdma
>
> ms_async_rdma_device_name = mlx4_0
>
>
>
>
>
> What surprised me is that the result of RDMA mode is almost the same as
> the basic mode, the iops, latency, throughput, etc.
>
> I also try to use different pattern of the fio parameter, such as read and
> write ratio, random operations or sequence operations.
>
> All results are the same.
>
>
>
> yes, most of latency comes from other components now.. although we still
> want to avoid extra copy in rdma side.
>
>
>
> so current rdma backend only means it just can be choice compared to
> tcp/ip network. more benefits need to be get from others.
>
>
>
>
>
> In order to figure out what's going on. I do the following steps.
>
>
>
> 1. Follow this article (https://community.mellanox.com/docs/DOC-2086) to
> make sure my RDMA environment.
>
> 2. To make sure the network traffic is transmitted by RDMA, I dump the
> traffic within the private network and the answear is yes. it use the RDMA.
>
> 3. Modify the ms_async_rdma_buffer_size to (256 << 10), no change.
>
> 4. Modfiy the ms_async_rdma_send_buffers to 2048, no change.
>
> 5. Modify the ms_async_rdma_receive_buffers to 2048, no change.
>
>
>
> After above operations, I guess maybe my Ceph setup environment is not
> good for RDMA to improve the performance.
>
>
>
> Do anyone know what kind of the ceph environment (replicated size, # of
> osd, # of mon, etc) is good for RDMA?
>
>
>
> Thanks in advanced.
>
>
>
>
>
>
> Best Regards,
>
> Hung-Wei Chiu(邱宏瑋)
>
> --
> Computer Center, Department of Computer Science
> National Chiao Tung University
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ------------------------------
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to