Re: asynchronous operation with poll()
On 11/10/10 10:30, Andrea Gozzelino wrote: Hi Jonathan, I wrote down a test (latency and transfer speed) with RDMA. Server and client work with the same code and they change defined size buffers for n times (loop). In the makefile.txt, you can find an help to use the code. I tested Intel NetEffect NE020 E10G81GP cards with this code and I found minimum latency about 11 us,maximum transfer speed about 9,6 GBytes, CPU usage up to 90% on client side. The last value is not good for us. Hi Andrea, Thanks for the code. With the advice from Jason I have changed my test program to get reliable communication using 1Mbyte buffers. The CPU usage is less than 2% on both client and server for 10Gb throughput. I have Chelsio S310CR. I find using the poll() approach more natural as I have experience with conventional sockets based programming before. The rdma_client/rdma_server example programs from librdmacm were the easiest to start from and I have incrementally changed them from synchronous to asynchronous operation, and moved the internals of the high level functions in into my own code piece by piece. The learning curve is very steep :) I found this paper quite interesting www.systems.ethz.ch/research/awards/minimizingthehidden.pdf Cheers, Jonathan. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: asynchronous operation with poll()
On 11/09/10 20:44, Jason Gunthorpe wrote: Broadly it looks to me like your actions are in the wrong order. A poll based RDMA loop should look like this: - exit poll - Check poll bit - call ibv_get_cq_event - call ibv_req_notify_cq - repeatedly call ibv_poll_cq (while rc == num requested) - Issue new work - return to poll Generally, for your own sanity, I recommend splitting into 3 functions - Do the stuff with ibv_get_cq_event - Drain and process WC's - Issue new work Hi Jason, Thanks very much for your advice. I had misunderstood the relationship between CQ events and available WC's. doing multithreaded things. Using num_send is wrong, I use this: > bool checkCQPoll(struct pollfd&p) Right - using a function like your checkCQPoll has sorted out the behaviour of the poll() loop. Continually posting sends and recvs will get you into trouble, you will run out of recvs and get RNR's. These days the wisdom for implementing RDMA is that you should have explicit message flow OK - I appreciate that a real world protocol ought to have flow control rather than just send as fast as possible. I've been trying to exercise the interfaces as far as possible and make sure my RDMA implementation is solid before building something real on top of it. control. Ie for something simple like this you could say that getting a recv means another send is OK, but you still need a mechanism to wait for a send buffer to be returned on the send CQ - there is no ordering guarantee. Could I get some clarification on where there is no ordering guarantee? The WC's do not necessarily come back in the order that the sends were posted? Many Thanks, Jonathan. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
asynchronous operation with poll()
I have a client and server test program to explore fully asynchronous communication written as close to a conventional sockets application as possible and am encountering difficulty. Both programs run the same code in a thread, sending buffers to each other as fast as possible. On the client side only, my poll() call never blocks and cm_id->send_cq_channel->fd always seems to be readable. This causes the program to loop wildly and consume 100% CPU. Any ideas? I have ensured that O_NONBLOCK is set on the underlying file descriptors. I'm not sure why the server side should run with almost no cpu usage yet the client does not. Here is the client/server loop: struct ibv_mr *mr; int ret; int send_buf_num = 0; int recv_buf_num = 0; #define NUM_BUFFERS 20 #define SIZE 1024*1024 uint8_t *buffer = (uint8_t*)malloc(SIZE * NUM_BUFFERS * 2); uint8_t *send_msg[NUM_BUFFERS]; uint8_t *recv_msg[NUM_BUFFERS]; for(int i=0; irecv_cq, 0); // // // main loop while(ret == 0) { memset(fds, 0, sizeof(pollfd) * NUM_FDS); fds[POLL_CM].fd = cm_channel->fd; fds[POLL_CM].events = POLLIN; fds[POLL_RECV_CQ].fd = cm_id->recv_cq_channel->fd; fds[POLL_RECV_CQ].events = POLLIN; fds[POLL_SEND_CQ].fd = cm_id->send_cq_channel->fd; fds[POLL_SEND_CQ].events = POLLIN; fds[POLL_WAKE].fd = wake_fds[0]; fds[POLL_WAKE].events = POLLIN; int nready = poll(fds, NUM_FDS, -1); if(nready < 0) { perror("poll"); } if(fds[POLL_CM].revents & POLLIN) { struct rdma_cm_event *cm_event; ret = rdma_get_cm_event(cm_channel, &cm_event); if(ret) { perror("client connection rdma_get_cm_event"); } fprintf(stderr, "Got cm event %s\n", rdma_event_str(cm_event->event)); if(cm_event->event == RDMA_CM_EVENT_ESTABLISHED) { //send as soon as we are connected ibv_req_notify_cq(cm_id->send_cq, 0); ret = rdma_post_send(cm_id, NULL, send_msg[send_buf_num], SIZE, mr, 0); send_buf_num++; send_buf_num %= NUM_BUFFERS; if (ret) { perror("rdma_post_send"); } } int finish=0; if(cm_event->event == RDMA_CM_EVENT_DISCONNECTED || cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) finish = 1; rdma_ack_cm_event(cm_event); if(finish) { goto out; } } //if the send completed if(fds[POLL_SEND_CQ].revents & POLLIN) { struct ibv_cq *cq; struct ibv_wc wc[10]; void *context; int num_send = ibv_poll_cq(cm_id->send_cq, 10, &wc[0]); if(num_send == 0) fprintf(stderr, "."); for(int i=0; isend_cq_channel, &cq, &context); assert(cq == cm_id->send_cq); //our send completed, send some more right away fprintf(stderr, "rdma_post_send\n"); ret = rdma_post_send(cm_id, NULL, send_msg[send_buf_num++], SIZE, mr, 0); send_buf_num %= NUM_BUFFERS; if (ret) { perror("rdma_post_send"); } } //expensive call, ack all received events together ibv_ack_cq_events(cm_id->send_cq, num_send); ibv_req_notify_cq(cm_id->send_cq, 0); } //if the receive completed, prepare to receive more if(fds[POLL_RECV_CQ].revents & POLLIN) { struct ibv_cq *cq; struct ibv_wc wc[10]; void *context; int num_recv=ibv_poll_cq(cm_id->recv_cq, 10, &wc[0]); for(int i=0; irecv_cq_channel, &cq, &context); assert(cq == cm_id->recv_cq); //we received some payload, prepare to receive more fprintf(stderr, "rdma_post_recv\n"); ret = rdma_post_recv(cm_id, NULL, recv_msg[recv_buf_num++], SIZE, mr); recv_buf_num %= NUM_BUFFERS; if (ret) { perror("rdma_post_recv"); } } //expensive call, ack all received events together ibv_ack_cq_events(cm_id->recv_cq, num_recv); ibv_req_notify_cq(cm_id->recv_cq, 0); } if(fds[POLL_WAKE].revents & POLLIN) { fprintf(stderr, "poll WAKE\n"); char buffer[1]; int nread = read(wake_fds[0], buffer, 1); fprintf(stderr, "Got Wake event %d\n", nread); goto out; } } out: rdma_disconnect(cm_id); rdma_dereg_mr(mr); rdma_destroy_ep(cm_id); free(buffer); fprintf(stderr, "poll: client completed\n"); -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] librdmacm: fix make install
On 10/25/10 10:51, Jonathan Rosser wrote: Signed-off-by: Jonathan Rosser --- Oh right. gmane obfuscates my address in the patch too. Ho-hum. Replace with jonathan.rosser @ rd. bbc. co. uk for a real address. Sorry about that. Regards, Jonathan. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] librdmacm: fix compiler warning of void* arithmetic
Arithmetic on void* pointers generates a compiler warning, and projects that include rdma/rdma_verbs.h and compile with -Werror -Wall will fail to build. Signed-off-by: Jonathan Rosser --- include/rdma/rdma_verbs.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/rdma/rdma_verbs.h b/include/rdma/rdma_verbs.h index d75d906..853ef9b 100644 --- a/include/rdma/rdma_verbs.h +++ b/include/rdma/rdma_verbs.h @@ -160,7 +160,7 @@ rdma_post_recv(struct rdma_cm_id *id, void *context, void *addr, { struct ibv_sge sge; - assert((addr >= mr->addr) && ((addr + length) <= (mr->addr + mr->length))); + assert((addr >= mr->addr) && (((uint8_t*)addr + length) <= ((uint8_t*)mr->addr + mr->length))); sge.addr = (uint64_t) (uintptr_t) addr; sge.length = (uint32_t) length; sge.lkey = mr->lkey; -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] librdmacm: fix make install
make install fails if the include files in the install prefix include/rdma,infiniband already exist. install claims that the and file are the same and exits with an error. This patch modifies Makefile.am so that the rdma and infiniband include files explicitly reference the source directory rather than the build directory. Also, EXTRA_DIST now only lists files that are not referenced anywhere else in Makefile.am Signed-off-by: Jonathan Rosser --- Makefile.am | 12 +--- 1 files changed, 5 insertions(+), 7 deletions(-) diff --git a/Makefile.am b/Makefile.am index 2668aa3..8790fe8 100644 --- a/Makefile.am +++ b/Makefile.am @@ -35,11 +35,11 @@ examples_rdma_server_LDADD = $(top_builddir)/src/librdmacm.la librdmacmincludedir = $(includedir)/rdma infinibandincludedir = $(includedir)/infiniband -librdmacminclude_HEADERS = include/rdma/rdma_cma_abi.h \ - include/rdma/rdma_cma.h \ - include/rdma/rdma_verbs.h +librdmacminclude_HEADERS = $(top_srcdir)/include/rdma/rdma_cma_abi.h \ + $(top_srcdir)/include/rdma/rdma_cma.h \ + $(top_srcdir)/include/rdma/rdma_verbs.h -infinibandinclude_HEADERS = include/infiniband/ib.h +infinibandinclude_HEADERS = $(top_srcdir)/include/infiniband/ib.h man_MANS = \ man/rdma_accept.3 \ @@ -97,9 +97,7 @@ man_MANS = \ man/rdma_client.1 \ man/rdma_cm.7 -EXTRA_DIST = include/rdma/rdma_cma_abi.h include/rdma/rdma_cma.h \ -include/infiniband/ib.h include/rdma/rdma_verbs.h \ -src/cma.h src/librdmacm.map librdmacm.spec.in $(man_MANS) +EXTRA_DIST = src/cma.h src/librdmacm.map librdmacm.spec.in $(man_MANS) dist-hook: librdmacm.spec cp librdmacm.spec $(distdir) -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] librdmacm: fix make install
make install fails if the include files in the install prefix include/rdma,infiniband already exist. install claims that the and file are the same and exits with an error. This patch modifies Makefile.am so that the rdma and infiniband include files explicitly reference the source directory rather than the build directory. Also, EXTRA_DIST now only lists files that are not referenced anywhere else in Makefile.am --- Makefile.am | 12 +--- 1 files changed, 5 insertions(+), 7 deletions(-) diff --git a/Makefile.am b/Makefile.am index 2668aa3..8790fe8 100644 --- a/Makefile.am +++ b/Makefile.am @@ -35,11 +35,11 @@ examples_rdma_server_LDADD = $(top_builddir)/src/librdmacm.la librdmacmincludedir = $(includedir)/rdma infinibandincludedir = $(includedir)/infiniband -librdmacminclude_HEADERS = include/rdma/rdma_cma_abi.h \ - include/rdma/rdma_cma.h \ - include/rdma/rdma_verbs.h +librdmacminclude_HEADERS = $(top_srcdir)/include/rdma/rdma_cma_abi.h \ + $(top_srcdir)/include/rdma/rdma_cma.h \ + $(top_srcdir)/include/rdma/rdma_verbs.h -infinibandinclude_HEADERS = include/infiniband/ib.h +infinibandinclude_HEADERS = $(top_srcdir)/include/infiniband/ib.h man_MANS = \ man/rdma_accept.3 \ @@ -97,9 +97,7 @@ man_MANS = \ man/rdma_client.1 \ man/rdma_cm.7 -EXTRA_DIST = include/rdma/rdma_cma_abi.h include/rdma/rdma_cma.h \ -include/infiniband/ib.h include/rdma/rdma_verbs.h \ -src/cma.h src/librdmacm.map librdmacm.spec.in $(man_MANS) +EXTRA_DIST = src/cma.h src/librdmacm.map librdmacm.spec.in $(man_MANS) dist-hook: librdmacm.spec cp librdmacm.spec $(distdir) -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] librdmacm: fix compiler warning of void* arithmetic
Arithmetic on void* pointers generates a compiler warning, and projects that include rdma/rdma_verbs.h and compile with -Werror -Wall will fail to build. --- include/rdma/rdma_verbs.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/rdma/rdma_verbs.h b/include/rdma/rdma_verbs.h index d75d906..853ef9b 100644 --- a/include/rdma/rdma_verbs.h +++ b/include/rdma/rdma_verbs.h @@ -160,7 +160,7 @@ rdma_post_recv(struct rdma_cm_id *id, void *context, void *addr, { struct ibv_sge sge; - assert((addr >= mr->addr) && ((addr + length) <= (mr->addr + mr->length))); + assert((addr >= mr->addr) && (((uint8_t*)addr + length) <= ((uint8_t*)mr->addr + mr->length))); sge.addr = (uint64_t) (uintptr_t) addr; sge.length = (uint32_t) length; sge.lkey = mr->lkey; -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] librdmacm: Do not modify qp_init_attr in rdma_get_request
Hefty, Sean writes: > I added a while(1) loop to rdma_server to allow clients to connected > repeatedly, and this worked for me. Jonathan, can you see if this > works for your testing as well? If so, I'll commit. Yesterday I tried setting attr->send/recv_cq = NULL in rdma_get_request() which fixes the bug in a somewhat ugly manner. Passing a copy of the attributes is a much tidier solution, and your patch works for me. Many Thanks, Jonathan. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Extending rdma_server example
Hi, I took librdmacm/examples/rdma_server.c and converted it to be a persistent server to serve successive requests. However, this does not work and I have the following problem: fprintfs from my code (lower case) and librdmacm (caps) for a first successful connection and a second one which causes a segfault rdma_getaddrinfo rdma_listen rdma_getrequest <- wait for first request RDMA CREATE QP UCMA CREATE_CQS CALLED UCMA CREATE CQS CREATING RECV_CQ <- CQ created here OK for client connection CREATING SEND_CQ <- this looks good IBV CREATE QP rdma_reg_msgs rdma_post_recv rdma_accept rdma_get_recv_comp rdma_post_send rdma_get_send_comp rdma_getrequest <- wait for second request RDMA CREATE QP UCMA CREATE_CQS CALLED UCMA CREATE CQS <- !!! no recv/send CQ created here IBV CREATE QP rdma_reg_msgs rdma_post_recv rdma_accept rdma_get_recv_comp <- dereference of id->recv_cq = NULL Segmentation fault So I'm wondering why the CQs do not get created second time round and it looks like 1) rdma_get_request passes event_id and (listen_id)id_priv->qp_init_attr to rdma_create_qp() 2) rdma_create_qp passes qp_init_attr to ucma_create_cqs() 3) ucma_create_cqs stores a pointer to the created CQs in attr->recv_cq / send_cq, which is the attr of the listen_id 4) serving the second client, ucma_create_cqs checks attr->recv_cq and does not create a pair of CQs Feels a little odd to me that the listen_id takes a pointer to the CQs created for the client_id in its qp_init_attr. Can anyone enlighten me? Am I trying to do something (persistent server) that rdma_get_request is not intended to do? Cheers, Jonathan Rosser Senior R&D Engineer BBC R&D -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html