Re: asynchronous operation with poll()

2010-11-10 Thread Jonathan Rosser

On 11/10/10 10:30, Andrea Gozzelino wrote:

Hi Jonathan,

I wrote down a test (latency and transfer speed) with RDMA.
Server and client work with the same code and they change defined size
buffers for n times (loop). In the makefile.txt, you can find an help to
use the code.

I tested Intel NetEffect NE020 E10G81GP cards with this code and I found
minimum latency about 11 us,maximum transfer speed about 9,6 GBytes, CPU
usage up to 90% on client side.
The last value is not good for us.


Hi Andrea,

Thanks for the code. With the advice from Jason I have changed my test 
program to get reliable communication using 1Mbyte buffers. The CPU 
usage is less than 2% on both client and server for 10Gb throughput. I 
have Chelsio S310CR.


I find using the poll() approach more natural as I have experience with 
conventional sockets based programming before.


The rdma_client/rdma_server example programs from librdmacm were the 
easiest to start from and I have incrementally changed them from 
synchronous to asynchronous operation, and moved the internals of the 
high level functions in  into my own code piece by 
piece. The learning curve is very steep :)


I found this paper quite interesting 
www.systems.ethz.ch/research/awards/minimizingthehidden.pdf


Cheers,
Jonathan.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: asynchronous operation with poll()

2010-11-10 Thread Jonathan Rosser

On 11/09/10 20:44, Jason Gunthorpe wrote:

Broadly it looks to me like your actions are in the wrong order.
A poll based RDMA loop should look like this:

- exit poll
- Check poll bit
- call ibv_get_cq_event
- call ibv_req_notify_cq
- repeatedly call ibv_poll_cq (while rc == num requested)
- Issue new work
- return to poll

Generally, for your own sanity, I recommend splitting into 3 functions
- Do the stuff with ibv_get_cq_event
- Drain and process WC's
- Issue new work



Hi Jason,

Thanks very much for your advice. I had misunderstood the relationship 
between CQ events and available WC's.



doing multithreaded things. Using num_send is wrong, I use this:


> bool checkCQPoll(struct pollfd&p)

Right - using a function like your checkCQPoll has sorted out the 
behaviour of the poll() loop.



Continually posting sends and recvs will get you into trouble, you
will run out of recvs and get RNR's. These days the wisdom for
implementing RDMA is that you should have explicit message flow


OK - I appreciate that a real world protocol ought to have flow control 
rather than just send as fast as possible. I've been trying to exercise 
the interfaces as far as possible and make sure my RDMA implementation 
is solid before building something real on top of it.



control. Ie for something simple like this you could say that getting
a recv means another send is OK, but you still need a mechanism to
wait for a send buffer to be returned on the send CQ - there is no
ordering guarantee.


Could I get some clarification on where there is no ordering guarantee? 
The WC's do not necessarily come back in the order that the sends were 
posted?



Many Thanks,
Jonathan.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


asynchronous operation with poll()

2010-11-09 Thread Jonathan Rosser
I have a client and server test program to explore fully asynchronous 
communication written as close to a conventional sockets application as 
possible and am encountering difficulty.


Both programs run the same code in a thread, sending buffers to each 
other as fast as possible. On the client side only, my poll() call never 
blocks and cm_id->send_cq_channel->fd always seems to be readable. This 
causes the program to loop wildly and consume 100% CPU.


Any ideas? I have ensured that O_NONBLOCK is set on the underlying file 
descriptors. I'm not sure why the server side should run with almost no 
cpu usage yet the client does not.


Here is the client/server loop:


  struct ibv_mr *mr;
  int ret;
  int send_buf_num = 0;
  int recv_buf_num = 0;

  #define NUM_BUFFERS 20
  #define SIZE 1024*1024
  uint8_t *buffer = (uint8_t*)malloc(SIZE * NUM_BUFFERS * 2);
  uint8_t *send_msg[NUM_BUFFERS];
  uint8_t *recv_msg[NUM_BUFFERS];

  for(int i=0; irecv_cq, 0);
  //

  //
  // main loop
  while(ret == 0)
  {
memset(fds, 0, sizeof(pollfd) * NUM_FDS);
fds[POLL_CM].fd = cm_channel->fd;
fds[POLL_CM].events = POLLIN;

fds[POLL_RECV_CQ].fd = cm_id->recv_cq_channel->fd;
fds[POLL_RECV_CQ].events = POLLIN;

fds[POLL_SEND_CQ].fd = cm_id->send_cq_channel->fd;
fds[POLL_SEND_CQ].events = POLLIN;

fds[POLL_WAKE].fd = wake_fds[0];
fds[POLL_WAKE].events = POLLIN;

int nready = poll(fds, NUM_FDS, -1);
if(nready < 0) {
  perror("poll");
}

if(fds[POLL_CM].revents & POLLIN) {
  struct rdma_cm_event *cm_event;
  ret = rdma_get_cm_event(cm_channel, &cm_event);
  if(ret) {
perror("client connection rdma_get_cm_event");
  }
  fprintf(stderr, "Got cm event %s\n", rdma_event_str(cm_event->event));

  if(cm_event->event == RDMA_CM_EVENT_ESTABLISHED) {
//send as soon as we are connected
ibv_req_notify_cq(cm_id->send_cq, 0);
ret = rdma_post_send(cm_id, NULL, send_msg[send_buf_num], SIZE, mr, 0);
send_buf_num++;
send_buf_num %= NUM_BUFFERS;
if (ret) {
  perror("rdma_post_send");
}
  }

  int finish=0;
  if(cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
 cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL)
  finish = 1;

  rdma_ack_cm_event(cm_event);
  if(finish) {
goto out;
  }
}

//if the send completed
if(fds[POLL_SEND_CQ].revents & POLLIN) {
  struct ibv_cq *cq;
  struct ibv_wc wc[10];
  void *context;
  int num_send = ibv_poll_cq(cm_id->send_cq, 10, &wc[0]);

  if(num_send == 0) fprintf(stderr, ".");

  for(int i=0; isend_cq_channel, &cq, &context);
assert(cq == cm_id->send_cq);

//our send completed, send some more right away
fprintf(stderr, "rdma_post_send\n");
ret = rdma_post_send(cm_id, NULL, send_msg[send_buf_num++], SIZE, mr, 
0);
send_buf_num %= NUM_BUFFERS;
if (ret) {
  perror("rdma_post_send");
}
  }

  //expensive call, ack all received events together
  ibv_ack_cq_events(cm_id->send_cq, num_send);
  ibv_req_notify_cq(cm_id->send_cq, 0);
}

//if the receive completed, prepare to receive more
if(fds[POLL_RECV_CQ].revents & POLLIN) {
  struct ibv_cq *cq;
  struct ibv_wc wc[10];
  void *context;
  int num_recv=ibv_poll_cq(cm_id->recv_cq, 10, &wc[0]);

  for(int i=0; irecv_cq_channel, &cq, &context);
assert(cq == cm_id->recv_cq);

//we received some payload, prepare to receive more
fprintf(stderr, "rdma_post_recv\n");
ret = rdma_post_recv(cm_id, NULL, recv_msg[recv_buf_num++], SIZE, mr);
recv_buf_num %= NUM_BUFFERS;
if (ret) {
   perror("rdma_post_recv");
}
  }

  //expensive call, ack all received events together
  ibv_ack_cq_events(cm_id->recv_cq, num_recv);
  ibv_req_notify_cq(cm_id->recv_cq, 0);
}

if(fds[POLL_WAKE].revents & POLLIN) {
  fprintf(stderr, "poll WAKE\n");
  char buffer[1];
  int nread = read(wake_fds[0], buffer, 1);
  fprintf(stderr, "Got Wake event %d\n", nread);
  goto out;
}

  }

out:
  rdma_disconnect(cm_id);
  rdma_dereg_mr(mr);
  rdma_destroy_ep(cm_id);

  free(buffer);
  fprintf(stderr, "poll: client completed\n");




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] librdmacm: fix make install

2010-10-25 Thread Jonathan Rosser

On 10/25/10 10:51, Jonathan Rosser wrote:


Signed-off-by: Jonathan Rosser

---


Oh right. gmane obfuscates my address in the patch too. Ho-hum.

Replace with jonathan.rosser @ rd. bbc. co. uk for a real address.

Sorry about that.

Regards,
Jonathan.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] librdmacm: fix compiler warning of void* arithmetic

2010-10-25 Thread Jonathan Rosser

Arithmetic on void* pointers generates a compiler warning, and projects
that include rdma/rdma_verbs.h and compile with  -Werror -Wall will fail
to build.

Signed-off-by: Jonathan Rosser 
---
 include/rdma/rdma_verbs.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/rdma/rdma_verbs.h b/include/rdma/rdma_verbs.h
index d75d906..853ef9b 100644
--- a/include/rdma/rdma_verbs.h
+++ b/include/rdma/rdma_verbs.h
@@ -160,7 +160,7 @@ rdma_post_recv(struct rdma_cm_id *id, void *context, 
void *addr,

 {
struct ibv_sge sge;

-   assert((addr >= mr->addr) && ((addr + length) <= (mr->addr + 
mr->length)));
+   assert((addr >= mr->addr) && (((uint8_t*)addr + length) <= 
((uint8_t*)mr->addr + mr->length)));

sge.addr = (uint64_t) (uintptr_t) addr;
sge.length = (uint32_t) length;
sge.lkey = mr->lkey;
--
1.7.0.4



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] librdmacm: fix make install

2010-10-25 Thread Jonathan Rosser

make install fails if the include files in the install prefix
include/rdma,infiniband already exist. install claims that the 
and  file are the same and exits with an error.

This patch modifies Makefile.am so that the rdma and infiniband include
files explicitly reference the source directory rather than the build
directory.

Also, EXTRA_DIST now only lists files that are not referenced anywhere
else in Makefile.am

Signed-off-by: Jonathan Rosser 
---
 Makefile.am |   12 +---
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index 2668aa3..8790fe8 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -35,11 +35,11 @@ examples_rdma_server_LDADD = 
$(top_builddir)/src/librdmacm.la

 librdmacmincludedir = $(includedir)/rdma
 infinibandincludedir = $(includedir)/infiniband

-librdmacminclude_HEADERS = include/rdma/rdma_cma_abi.h \
-  include/rdma/rdma_cma.h \
-  include/rdma/rdma_verbs.h
+librdmacminclude_HEADERS = $(top_srcdir)/include/rdma/rdma_cma_abi.h \
+  $(top_srcdir)/include/rdma/rdma_cma.h \
+  $(top_srcdir)/include/rdma/rdma_verbs.h

-infinibandinclude_HEADERS = include/infiniband/ib.h
+infinibandinclude_HEADERS = $(top_srcdir)/include/infiniband/ib.h

 man_MANS = \
man/rdma_accept.3 \
@@ -97,9 +97,7 @@ man_MANS = \
man/rdma_client.1 \
man/rdma_cm.7

-EXTRA_DIST = include/rdma/rdma_cma_abi.h include/rdma/rdma_cma.h \
-include/infiniband/ib.h include/rdma/rdma_verbs.h \
-src/cma.h src/librdmacm.map librdmacm.spec.in $(man_MANS)
+EXTRA_DIST = src/cma.h src/librdmacm.map librdmacm.spec.in $(man_MANS)

 dist-hook: librdmacm.spec
cp librdmacm.spec $(distdir)
--
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] librdmacm: fix make install

2010-10-22 Thread Jonathan Rosser

make install fails if the include files in the install prefix
include/rdma,infiniband already exist. install claims that the 
and  file are the same and exits with an error.

This patch modifies Makefile.am so that the rdma and infiniband include
files explicitly reference the source directory rather than the build
directory.

Also, EXTRA_DIST now only lists files that are not referenced anywhere
else in Makefile.am
---
 Makefile.am |   12 +---
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index 2668aa3..8790fe8 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -35,11 +35,11 @@ examples_rdma_server_LDADD = 
$(top_builddir)/src/librdmacm.la

 librdmacmincludedir = $(includedir)/rdma
 infinibandincludedir = $(includedir)/infiniband

-librdmacminclude_HEADERS = include/rdma/rdma_cma_abi.h \
-  include/rdma/rdma_cma.h \
-  include/rdma/rdma_verbs.h
+librdmacminclude_HEADERS = $(top_srcdir)/include/rdma/rdma_cma_abi.h \
+  $(top_srcdir)/include/rdma/rdma_cma.h \
+  $(top_srcdir)/include/rdma/rdma_verbs.h

-infinibandinclude_HEADERS = include/infiniband/ib.h
+infinibandinclude_HEADERS = $(top_srcdir)/include/infiniband/ib.h

 man_MANS = \
man/rdma_accept.3 \
@@ -97,9 +97,7 @@ man_MANS = \
man/rdma_client.1 \
man/rdma_cm.7

-EXTRA_DIST = include/rdma/rdma_cma_abi.h include/rdma/rdma_cma.h \
-include/infiniband/ib.h include/rdma/rdma_verbs.h \
-src/cma.h src/librdmacm.map librdmacm.spec.in $(man_MANS)
+EXTRA_DIST = src/cma.h src/librdmacm.map librdmacm.spec.in $(man_MANS)

 dist-hook: librdmacm.spec
cp librdmacm.spec $(distdir)
--
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] librdmacm: fix compiler warning of void* arithmetic

2010-10-22 Thread Jonathan Rosser

Arithmetic on void* pointers generates a compiler warning, and projects
that include rdma/rdma_verbs.h and compile with  -Werror -Wall will fail
to build.
---
 include/rdma/rdma_verbs.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/rdma/rdma_verbs.h b/include/rdma/rdma_verbs.h
index d75d906..853ef9b 100644
--- a/include/rdma/rdma_verbs.h
+++ b/include/rdma/rdma_verbs.h
@@ -160,7 +160,7 @@ rdma_post_recv(struct rdma_cm_id *id, void *context, 
void *addr,

 {
struct ibv_sge sge;

-	assert((addr >= mr->addr) && ((addr + length) <= (mr->addr + 
mr->length)));
+	assert((addr >= mr->addr) && (((uint8_t*)addr + length) <= 
((uint8_t*)mr->addr + mr->length)));

sge.addr = (uint64_t) (uintptr_t) addr;
sge.length = (uint32_t) length;
sge.lkey = mr->lkey;
--
1.7.0.4



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] librdmacm: Do not modify qp_init_attr in rdma_get_request

2010-10-19 Thread Jonathan Rosser
Hefty, Sean  writes:

> I added a while(1) loop to rdma_server to allow clients to connected
> repeatedly, and this worked for me.  Jonathan, can you see if this
> works for your testing as well?  If so, I'll commit.

Yesterday I tried setting attr->send/recv_cq = NULL in rdma_get_request() which 
fixes the bug in a somewhat ugly manner. Passing a copy of the attributes is a 
much tidier solution, and your patch works for me.

Many Thanks,
Jonathan.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Extending rdma_server example

2010-10-18 Thread Jonathan Rosser
Hi,

I took librdmacm/examples/rdma_server.c and converted it to be a persistent
server to serve successive requests. However, this does not work and I have 
the following problem:

fprintfs from my code (lower case) and librdmacm (caps) for a first successful
connection and a second one which causes a segfault

rdma_getaddrinfo
rdma_listen
rdma_getrequest  <- wait for first request
RDMA CREATE QP
UCMA CREATE_CQS
CALLED UCMA CREATE CQS
CREATING RECV_CQ <- CQ created here OK for client connection
CREATING SEND_CQ <- this looks good
IBV CREATE QP
rdma_reg_msgs
rdma_post_recv
rdma_accept
rdma_get_recv_comp
rdma_post_send
rdma_get_send_comp
rdma_getrequest  <- wait for second request
RDMA CREATE QP
UCMA CREATE_CQS
CALLED UCMA CREATE CQS   <- !!! no recv/send CQ created here
IBV CREATE QP
rdma_reg_msgs
rdma_post_recv
rdma_accept
rdma_get_recv_comp   <- dereference of id->recv_cq = NULL
Segmentation fault

So I'm wondering why the CQs do not get created second time round and it
looks like

1) rdma_get_request passes event_id and (listen_id)id_priv->qp_init_attr to
   rdma_create_qp()

2) rdma_create_qp passes qp_init_attr to ucma_create_cqs()

3) ucma_create_cqs stores a pointer to the created CQs in attr->recv_cq /
   send_cq, which is the attr of the listen_id

4) serving the second client, ucma_create_cqs checks attr->recv_cq and does 
   not create a pair of CQs

Feels a little odd to me that the listen_id takes a pointer to the CQs created
for the client_id in its qp_init_attr.

Can anyone enlighten me? Am I trying to do something (persistent server) that
rdma_get_request is not intended to do?

Cheers,
Jonathan Rosser
Senior R&D Engineer
BBC R&D

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html