plz uses master branch to test rdma

On Sun, Mar 19, 2017 at 11:08 PM, Hung-Wei Chiu (邱宏瑋) <hwc...@cs.nctu.edu.tw
> wrote:

> Hi
>
> I want to test the performance for Ceph with RDMA, so I build the ceph
> with RDMA and deploy into my test environment manually.
>
> I use the fio for my performance evaluation and it works fine if the Cepu
> use the *async + posix* as its ms_type.
> After changing the ms_type from *async + posix* to *async + rdma,  *some
> osd's status will turn down during the performance testing and that  causing
> the fio can't finish its job.
> The log file of those strange OSD shows that there're something wrong when
> OSD try to send a message and you can see below.
>
> ...
> 2017-03-20 09:43:10.096042 7faac163e700 -1 Infiniband recv_msg got error
> -104: (104) Connection reset by peer
> 2017-03-20 09:43:10.096314 7faac163e700 0 -- 10.0.0.16:6809/23853 >>
> 10.0.0.17:6813/32315 conn(0x563de5282000 :-1 s=STATE_OPEN pgs=264 cs=29
> l=0).fault initiating reconnect
> 2017-03-20 09:43:10.251606 7faac1e3f700 -1 Infiniband send_msg send
> returned error 32: (32) Broken pipe
> 2017-03-20 09:43:10.251755 7faac1e3f700 0 -- 10.0.0.16:6809/23853 >>
> 10.0.0.17:6821/32509 conn(0x563de51f1000 :-1 s=STATE_OPEN pgs=314 cs=24
> l=0).fault initiating reconnect
> 2017-03-20 09:43:10.254103 7faac1e3f700 -1 Infiniband send_msg send
> returned error 32: (32) Broken pipe
> 2017-03-20 09:43:10.254375 7faac1e3f700 0 -- 10.0.0.16:6809/23853 >>
> 10.0.0.15:6821/48196 conn(0x563de514b000 :6809 s=STATE_OPEN pgs=275 cs=30
> l=0).fault initiating reconnect
> 2017-03-20 09:43:10.260622 7faac1e3f700 -1 Infiniband send_msg send
> returned error 32: (32) Broken pipe
> 2017-03-20 09:43:10.260693 7faac1e3f700 0 -- 10.0.0.16:6809/23853 >>
> 10.0.0.15:6805/47835 conn(0x563de537d800 :-1 s=STATE_OPEN pgs=310 cs=11
> l=0).fault with nothing to send, going to standby
> 2017-03-20 09:43:10.264621 7faac163e700 -1 Infiniband send_msg send
> returned error 32: (32) Broken pipe
> 2017-03-20 09:43:10.264682 7faac163e700 0 -- 10.0.0.16:6809/23853 >>
> 10.0.0.15:6829/48397 conn(0x563de5fdb000 :-1 s=STATE_OPEN pgs=231 cs=23
> l=0).fault with nothing to send, going to standby
> 2017-03-20 09:43:10.291832 7faac163e700 -1 Infiniband send_msg send
> returned error 32: (32) Broken pipe
> 2017-03-20 09:43:10.291895 7faac163e700 0 -- 10.0.0.16:6809/23853 >>
> 10.0.0.17:6817/32412 conn(0x563de50f5800 :-1 s=STATE_OPEN pgs=245 cs=25
> l=0).fault initiating reconnect
> 2017-03-20 09:43:10.387540 7faac2e41700 -1 Infiniband send_msg send
> returned error 32: (32) Broken pipe
> 2017-03-20 09:43:10.387565 7faac2e41700 -1 Infiniband send_msg send
> returned error 32: (32) Broken pipe
> 2017-03-20 09:43:10.387635 7faac2e41700 0 -- 10.0.0.16:6809/23853 >>
> 10.0.0.17:6801/32098 conn(0x563de51ab800 :6809 s=STATE_OPEN pgs=268 cs=23
> l=0).fault with nothing to send, going to standby
> 2017-03-20 09:43:11.453373 7faabdee0700 -1 osd.10 902 heartbeat_check: no
> reply from 10.0.0.15:6803 osd.0 since back 2017-03-20 09:42:50.610507
> front 2017-03-20 09:42:50.610507 (cutoff 2017-03-20 09:42:51.453371)
> 2017-03-20 09:43:11.453422 7faabdee0700 -1 osd.10 902 heartbeat_check: no
> reply from 10.0.0.15:6807 osd.1 since back 2017-03-20 09:42:50.610507
> front 2017-03-20 09:42:50.610507 (cutoff 2017-03-20 09:42:51.453371)
> 2017-03-20 09:43:11.453435 7faabdee0700 -1 osd.10 902 heartbeat_check: no
> reply from 10.0.0.15:6811 osd.2 since back 2017-03-20 09:42:50.610507
> front 2017-03-20 09:42:50.610507 (cutoff 2017-03-20 09:42:51.453371)
> 2017-03-20 09:43:11.453444 7faabdee0700 -1 osd.10 902 heartbeat_check: no
> reply from 10.0.0.15:6815 osd.3 since back 2017-03-20 09:42:50.610507
> front 2017-03-20 09:42:50.610507 (cutoff 2017-03-20 09:42:51.453371)
> *...*
>
>
> The following is my environment.
> *[Software]*
> *Ceph Version*: ceph version 12.0.0-1356-g7ba32cb (I build my self with
> master branch)
>
> *Deployment*: Without ceph-deploy and systemd, just manually invoke every
> daemons.
>
> *Host*: Ubuntu 16.04.1 LTS (x86_64 ), with linux kernel 4.4.0-66-generic.
>
> *NIC*: Ethernet controller: Mellanox Technologies MT27520 Family
> [ConnectX-3 Pro]
>
> *NIC Driver*: MLNX_OFED_LINUX-4.0-1.0.1.0 (OFED-4.0-1.0.1):
>
>
> *[Configuration]*
> Ceph.conf
>
> [global]
> fsid = 0612cc7e-6239-456c-978b-b4df781fe831
> mon initial members = ceph-1,ceph-2,ceph-3
> mon host = 10.0.0.15,10.0.0.16,10.0.0.17
> osd pool default size = 2
> osd pool default pg num = 1024
> osd pool default pgp num = 1024
> ms_type=async+rdma
> ms_async_rdma_device_name = mlx4_0
>
> Fio.conf
>
> [global]
>
> ioengine=rbd
> clientname=admin
> pool=rbd
> rbdname=rbd
> clustername=ceph
> runtime=120
> iodepth=128
> numjobs=6
> group_reporting
> size=256G
> direct=1
> ramp_time=5
> [r75w25]
> bs=4k
> rw=randrw
> rwmixread=75
>
>
> *[Cluster Env]*
>
>    1. Total three Node.
>    2. 3 ceph monitors on each node.
>    3. 8 ceph osd on each node (total 24 osd).
>
>
> Thanks
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to