Hi

I want to test the performance for Ceph with RDMA, so I build the ceph with
RDMA and deploy into my test environment manually.

I use the fio for my performance evaluation and it works fine if the Cepu
use the *async + posix* as its ms_type.
After changing the ms_type from *async + posix* to *async + rdma,  *some
osd's status will turn down during the performance testing and that  causing
the fio can't finish its job.
The log file of those strange OSD shows that there're something wrong when
OSD try to send a message and you can see below.

...
2017-03-20 09:43:10.096042 7faac163e700 -1 Infiniband recv_msg got error
-104: (104) Connection reset by peer
2017-03-20 09:43:10.096314 7faac163e700 0 -- 10.0.0.16:6809/23853 >>
10.0.0.17:6813/32315 conn(0x563de5282000 :-1 s=STATE_OPEN pgs=264 cs=29
l=0).fault initiating reconnect
2017-03-20 09:43:10.251606 7faac1e3f700 -1 Infiniband send_msg send
returned error 32: (32) Broken pipe
2017-03-20 09:43:10.251755 7faac1e3f700 0 -- 10.0.0.16:6809/23853 >>
10.0.0.17:6821/32509 conn(0x563de51f1000 :-1 s=STATE_OPEN pgs=314 cs=24
l=0).fault initiating reconnect
2017-03-20 09:43:10.254103 7faac1e3f700 -1 Infiniband send_msg send
returned error 32: (32) Broken pipe
2017-03-20 09:43:10.254375 7faac1e3f700 0 -- 10.0.0.16:6809/23853 >>
10.0.0.15:6821/48196 conn(0x563de514b000 :6809 s=STATE_OPEN pgs=275 cs=30
l=0).fault initiating reconnect
2017-03-20 09:43:10.260622 7faac1e3f700 -1 Infiniband send_msg send
returned error 32: (32) Broken pipe
2017-03-20 09:43:10.260693 7faac1e3f700 0 -- 10.0.0.16:6809/23853 >>
10.0.0.15:6805/47835 conn(0x563de537d800 :-1 s=STATE_OPEN pgs=310 cs=11
l=0).fault with nothing to send, going to standby
2017-03-20 09:43:10.264621 7faac163e700 -1 Infiniband send_msg send
returned error 32: (32) Broken pipe
2017-03-20 09:43:10.264682 7faac163e700 0 -- 10.0.0.16:6809/23853 >>
10.0.0.15:6829/48397 conn(0x563de5fdb000 :-1 s=STATE_OPEN pgs=231 cs=23
l=0).fault with nothing to send, going to standby
2017-03-20 09:43:10.291832 7faac163e700 -1 Infiniband send_msg send
returned error 32: (32) Broken pipe
2017-03-20 09:43:10.291895 7faac163e700 0 -- 10.0.0.16:6809/23853 >>
10.0.0.17:6817/32412 conn(0x563de50f5800 :-1 s=STATE_OPEN pgs=245 cs=25
l=0).fault initiating reconnect
2017-03-20 09:43:10.387540 7faac2e41700 -1 Infiniband send_msg send
returned error 32: (32) Broken pipe
2017-03-20 09:43:10.387565 7faac2e41700 -1 Infiniband send_msg send
returned error 32: (32) Broken pipe
2017-03-20 09:43:10.387635 7faac2e41700 0 -- 10.0.0.16:6809/23853 >>
10.0.0.17:6801/32098 conn(0x563de51ab800 :6809 s=STATE_OPEN pgs=268 cs=23
l=0).fault with nothing to send, going to standby
2017-03-20 09:43:11.453373 7faabdee0700 -1 osd.10 902 heartbeat_check: no
reply from 10.0.0.15:6803 osd.0 since back 2017-03-20 09:42:50.610507 front
2017-03-20 09:42:50.610507 (cutoff 2017-03-20 09:42:51.453371)
2017-03-20 09:43:11.453422 7faabdee0700 -1 osd.10 902 heartbeat_check: no
reply from 10.0.0.15:6807 osd.1 since back 2017-03-20 09:42:50.610507 front
2017-03-20 09:42:50.610507 (cutoff 2017-03-20 09:42:51.453371)
2017-03-20 09:43:11.453435 7faabdee0700 -1 osd.10 902 heartbeat_check: no
reply from 10.0.0.15:6811 osd.2 since back 2017-03-20 09:42:50.610507 front
2017-03-20 09:42:50.610507 (cutoff 2017-03-20 09:42:51.453371)
2017-03-20 09:43:11.453444 7faabdee0700 -1 osd.10 902 heartbeat_check: no
reply from 10.0.0.15:6815 osd.3 since back 2017-03-20 09:42:50.610507 front
2017-03-20 09:42:50.610507 (cutoff 2017-03-20 09:42:51.453371)
*...*


The following is my environment.
*[Software]*
*Ceph Version*: ceph version 12.0.0-1356-g7ba32cb (I build my self with
master branch)

*Deployment*: Without ceph-deploy and systemd, just manually invoke every
daemons.

*Host*: Ubuntu 16.04.1 LTS (x86_64 ), with linux kernel 4.4.0-66-generic.

*NIC*: Ethernet controller: Mellanox Technologies MT27520 Family
[ConnectX-3 Pro]

*NIC Driver*: MLNX_OFED_LINUX-4.0-1.0.1.0 (OFED-4.0-1.0.1):


*[Configuration]*
Ceph.conf

[global]
fsid = 0612cc7e-6239-456c-978b-b4df781fe831
mon initial members = ceph-1,ceph-2,ceph-3
mon host = 10.0.0.15,10.0.0.16,10.0.0.17
osd pool default size = 2
osd pool default pg num = 1024
osd pool default pgp num = 1024
ms_type=async+rdma
ms_async_rdma_device_name = mlx4_0

Fio.conf

[global]

ioengine=rbd
clientname=admin
pool=rbd
rbdname=rbd
clustername=ceph
runtime=120
iodepth=128
numjobs=6
group_reporting
size=256G
direct=1
ramp_time=5
[r75w25]
bs=4k
rw=randrw
rwmixread=75


*[Cluster Env]*

   1. Total three Node.
   2. 3 ceph monitors on each node.
   3. 8 ceph osd on each node (total 24 osd).


Thanks
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to