Re: [ceph-users] Help needed porting Ceph to RSockets

2013-09-12 Thread Andreas Bluemle
On Thu, 12 Sep 2013 12:20:03 +0200
Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote:

 2013/9/10 Andreas Bluemle andreas.blue...@itxperts.de:
  Since I have added these workarounds to my version of the librdmacm
  library, I can at least start up ceph using LD_PRELOAD and end up in
  a healthy ceph cluster state.
 
 Have you seen any performance improvement by using LD_PRELOAD with
 ceph? Which throughput are you able to archive with rsocket and ceph?

I have not yet done any performance testing.

The next step I have to take is more related to setting up
a larger cluster with sth. like 150 osd's without hitting any
resource limitations.

Regards

Andreas Bluemle

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 



-- 
Andreas Bluemle mailto:andreas.blue...@itxperts.de
ITXperts GmbH   http://www.itxperts.de
Balanstrasse 73, Geb. 08Phone: (+49) 89 89044917
D-81541 Muenchen (Germany)  Fax:   (+49) 89 89044910

Company details: http://www.itxperts.de/imprint.htm
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Help needed porting Ceph to RSockets

2013-09-10 Thread Andreas Bluemle
Hi,

after some more analysis and debugging, I found
workarounds for my problems; I have added these workarounds
to the last version of the patch for the poll problem by Sean;
see the attachment to this posting.

The shutdown() operations below are all SHUT_RDWR.

1. shutdown() on side A of a connection waits for close() on side B

   With rsockets, when a shutdown is done on side A of a socket
   connection, then the shutdown will only return after side B
   has done a close() on its end of the connection.

   This is different from TCP/IP sockets: there a shutdown will cause
   the other end to terminate the connection at the TCP level
   instantly. The socket changes state into CLOSE_WAIT, which indicates
   that the application level close is outstanding.

   In the attached patch, the workaround is in rs_poll_cq(),
   case RS_OP_CTRL, where for a RS_CTRL_DISCONNECT the rshutdown()
   is called on side B; this will cause the termination of the
   socket connection to acknowledged to side A and the shutdown()
   there can now terminate.

2. double (multiple) shutdown on side A: delay on 2nd shutdown

   When an application does a shutdown() of side A and does a 2nd
   shutdown() shortly after (for whatever reason) then the
   return of the 2nd shutdown() is delayed by 2 seconds.

   The delay happens in rdma_disconnect(), when this is called
   from rshutdown() in the case that the rsocket state is
   rs_disconnected.

   Even if it could be considered as a bug if an application
   calls shutdown() twice on the same socket, it still
   does not make sense to delay that 2nd call to shutdown().

   To workaround this, I have
   - introduced an additional rsocket state: rs_shutdown
   - switch to that new state in rshutdown() at the very end
 of the function.

   The first call to shutdown() will therefore switch to the new
   rsocket state rs_shutdown - and any further call to rshutdown()
   will not do anything any more, because every effect of rshutdown()
   will only happen if the rsocket state is either rs_connnected or
   rs_disconnected. Hence it would be better to explicitely check
   the rsocket state at the beginning of the function and return
   immediately if the state is rs_shutdown.

Since I have added these workarounds to my version of the librdmacm
library, I can at least start up ceph using LD_PRELOAD and end up in
a healthy ceph cluster state.

I would not call these workarounds a real fix, but they should point
out the problems which I am trying to solve.


Regards

Andreas Bluemle




On Fri, 23 Aug 2013 00:35:22 +
Hefty, Sean sean.he...@intel.com wrote:

  I tested out the patch and unfortunately had the same results as
  Andreas. About 50% of the time the rpoll() thread in Ceph still
  hangs when rshutdown() is called. I saw a similar behaviour when
  increasing the poll time on the pre-patched version if that's of
  any relevance.
 
 I'm not optimistic, but here's an updated patch.  I attempted to
 handle more shutdown conditions, but I can't say that any of those
 would prevent the hang that you see.
 
 I have a couple of questions: 
 
 Is there any chance that the code would call rclose while rpoll
 is still running?  Also, can you verify that the thread is in the
 real poll() call when the hang occurs?
 
 Signed-off-by: Sean Hefty sean.he...@intel.com
 ---
  src/rsocket.c |   35 +--
  1 files changed, 25 insertions(+), 10 deletions(-)
 
 diff --git a/src/rsocket.c b/src/rsocket.c
 index d544dd0..f94ddf3 100644
 --- a/src/rsocket.c
 +++ b/src/rsocket.c
 @@ -1822,7 +1822,12 @@ static int rs_poll_cq(struct rsocket *rs)
   rs-state = rs_disconnected;
   return 0;
   } else if (rs_msg_data(msg) ==
 RS_CTRL_SHUTDOWN) {
 - rs-state = ~rs_readable;
 + if (rs-state  rs_writable)
 {
 + rs-state =
 ~rs_readable;
 + } else {
 + rs-state =
 rs_disconnected;
 + return 0;
 + }
   }
   break;
   case RS_OP_WRITE:
 @@ -2948,10 +2953,12 @@ static int rs_poll_events(struct pollfd
 *rfds, struct pollfd *fds, nfds_t nfds) 
   rs = idm_lookup(idm, fds[i].fd);
   if (rs) {
 + fastlock_acquire(rs-cq_wait_lock);
   if (rs-type == SOCK_STREAM)
   rs_get_cq_event(rs);
   else
   ds_get_cq_event(rs);
 + fastlock_release(rs-cq_wait_lock);
   fds[i].revents = rs_poll_rs(rs,
 fds[i].events, 1, rs_poll_all); } else {
   fds[i

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-20 Thread Andreas Bluemle
Hi Sean,

I will re-check until the end of the week; there is
some test scheduling issue with our test system, which
affects my access times.

Thanks

Andreas


On Mon, 19 Aug 2013 17:10:11 +
Hefty, Sean sean.he...@intel.com wrote:

 Can you see if the patch below fixes the hang?
 
 Signed-off-by: Sean Hefty sean.he...@intel.com
 ---
  src/rsocket.c |   11 ++-
  1 files changed, 10 insertions(+), 1 deletions(-)
 
 diff --git a/src/rsocket.c b/src/rsocket.c
 index d544dd0..e45b26d 100644
 --- a/src/rsocket.c
 +++ b/src/rsocket.c
 @@ -2948,10 +2948,12 @@ static int rs_poll_events(struct pollfd
 *rfds, struct pollfd *fds, nfds_t nfds) 
   rs = idm_lookup(idm, fds[i].fd);
   if (rs) {
 + fastlock_acquire(rs-cq_wait_lock);
   if (rs-type == SOCK_STREAM)
   rs_get_cq_event(rs);
   else
   ds_get_cq_event(rs);
 + fastlock_release(rs-cq_wait_lock);
   fds[i].revents = rs_poll_rs(rs,
 fds[i].events, 1, rs_poll_all); } else {
   fds[i].revents = rfds[i].revents;
 @@ -3098,7 +3100,8 @@ int rselect(int nfds, fd_set *readfds, fd_set
 *writefds, 
  /*
   * For graceful disconnect, notify the remote side that we're
 - * disconnecting and wait until all outstanding sends complete.
 + * disconnecting and wait until all outstanding sends complete,
 provided
 + * that the remote side has not sent a disconnect message.
   */
  int rshutdown(int socket, int how)
  {
 @@ -3138,6 +3141,12 @@ int rshutdown(int socket, int how)
   if (rs-state  rs_connected)
   rs_process_cq(rs, 0, rs_conn_all_sends_done);
  
 + if (rs-state  rs_disconnected) {
 + /* Generate event by flushing receives to unblock
 rpoll */
 + ibv_req_notify_cq(rs-cm_id-recv_cq, 0);
 + rdma_disconnect(rs-cm_id);
 + }
 +
   if ((rs-fd_flags  O_NONBLOCK)  (rs-state 
 rs_connected)) rs_set_nonblocking(rs, rs-fd_flags);
  
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma
 in the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 



-- 
Andreas Bluemle mailto:andreas.blue...@itxperts.de
Heinrich Boell Strasse 88   Phone: (+49) 89 4317582
D-81829 Muenchen (Germany)  Mobil: (+49) 177 522 0151
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-20 Thread Andreas Bluemle
Hi,

I have added the patch and re-tested: I still encounter
hangs of my application. I am not quite sure whether the
I hit the same error on the shutdown because now I don't hit
the error always, but only every now and then.

WHen adding the patch to my code base (git tag v1.0.17) I notice
an offset of -34 lines. Which code base are you using?


Best Regards

Andreas Bluemle

On Tue, 20 Aug 2013 09:21:13 +0200
Andreas Bluemle andreas.blue...@itxperts.de wrote:

 Hi Sean,
 
 I will re-check until the end of the week; there is
 some test scheduling issue with our test system, which
 affects my access times.
 
 Thanks
 
 Andreas
 
 
 On Mon, 19 Aug 2013 17:10:11 +
 Hefty, Sean sean.he...@intel.com wrote:
 
  Can you see if the patch below fixes the hang?
  
  Signed-off-by: Sean Hefty sean.he...@intel.com
  ---
   src/rsocket.c |   11 ++-
   1 files changed, 10 insertions(+), 1 deletions(-)
  
  diff --git a/src/rsocket.c b/src/rsocket.c
  index d544dd0..e45b26d 100644
  --- a/src/rsocket.c
  +++ b/src/rsocket.c
  @@ -2948,10 +2948,12 @@ static int rs_poll_events(struct pollfd
  *rfds, struct pollfd *fds, nfds_t nfds) 
  rs = idm_lookup(idm, fds[i].fd);
  if (rs) {
  +   fastlock_acquire(rs-cq_wait_lock);
  if (rs-type == SOCK_STREAM)
  rs_get_cq_event(rs);
  else
  ds_get_cq_event(rs);
  +   fastlock_release(rs-cq_wait_lock);
  fds[i].revents = rs_poll_rs(rs,
  fds[i].events, 1, rs_poll_all); } else {
  fds[i].revents = rfds[i].revents;
  @@ -3098,7 +3100,8 @@ int rselect(int nfds, fd_set *readfds, fd_set
  *writefds, 
   /*
* For graceful disconnect, notify the remote side that we're
  - * disconnecting and wait until all outstanding sends complete.
  + * disconnecting and wait until all outstanding sends complete,
  provided
  + * that the remote side has not sent a disconnect message.
*/
   int rshutdown(int socket, int how)
   {
  @@ -3138,6 +3141,12 @@ int rshutdown(int socket, int how)
  if (rs-state  rs_connected)
  rs_process_cq(rs, 0, rs_conn_all_sends_done);
   
  +   if (rs-state  rs_disconnected) {
  +   /* Generate event by flushing receives to unblock
  rpoll */
  +   ibv_req_notify_cq(rs-cm_id-recv_cq, 0);
  +   rdma_disconnect(rs-cm_id);
  +   }
  +
  if ((rs-fd_flags  O_NONBLOCK)  (rs-state 
  rs_connected)) rs_set_nonblocking(rs, rs-fd_flags);
   
  
  
  --
  To unsubscribe from this list: send the line unsubscribe
  linux-rdma in the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
  
 
 
 



-- 
Andreas Bluemle mailto:andreas.blue...@itxperts.de
Heinrich Boell Strasse 88   Phone: (+49) 89 4317582
D-81829 Muenchen (Germany)  Mobil: (+49) 177 522 0151
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-14 Thread Andreas Bluemle
 into consideration what granularity the OS
  provides) and then call ibv_poll_cq(). Keep in mind, polling will
  prevent your CPU from reducing power.
  
  If the real poll() is actually checking for something (e.g.
  checking on the RDMA channel's fd or the IB channel's fd), then you
  may not want to spin too much.
 
 The real poll() call is intended to block the application until a
 timeout occurs or an event shows up.  Since increasing the spin time
 works for you, it makes me suspect that there is a bug in the CQ
 event handling in rsockets. 

   What's particularly weird is that the monitor receives a POLLHUP
   event when the ceph command shuts down it's socket but the ceph
   command never does. When using regular sockets both sides of the
   connection receive a POLLIN | POLLHUP | POLRDHUP event when the
   sockets are shut down. It would seem like there is a bug in
   rsockets that causes the side that calls shutdown first not to
   receive the correct rpoll events.
 
 rsockets does not support POLLRDHUP.
 

I don't think the issue is POLLRDHUP.
I think the issue is POLLHUP and/or POLLIN.
My impression is that on a local shutdown (r)socket, a poll
for POLLIN event should at least return a POLLIN event and
a subsequent read should return 0 bytes indicating EOF.
But the POLLIN is not generated from the layer below 
rsockets (ib_uverbs.ko?) as far as I can tell.

See also: http://www.greenend.org.uk/rjk/tech/poll.html

Best Regards

Andreas Bluemle

 - Sean
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma
 in the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 




-- 
Andreas Bluemle mailto:andreas.blue...@itxperts.de
ITXperts GmbH   http://www.itxperts.de
Balanstrasse 73, Geb. 08Phone: (+49) 89 89044917
D-81541 Muenchen (Germany)  Fax:   (+49) 89 89044910

Company details: http://www.itxperts.de/imprint.htm
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using rsockets via librspreload: poll() support?

2013-08-08 Thread Andreas Bluemle
Hi Sean,

I begin to believe that this may be a more general
problem: it seems to me that errno is not always
initialized to 0 when the librspreload wrapper for
a socket system call or the corresponding r*()
routine from rsocket.c is called.

For the poll() I have cleared the errno explicitly
before polling the socket - and it is still cleared
on return from poll(). Hence: where I used to encounter
an EOPNOTSUPP, I now see errno 0 (i.e. Success).


Best Regards

Andreas Bluemle


On Thu, 8 Aug 2013 17:46:29 +0200
Andreas Bluemle andreas.blue...@itxperts.de wrote:

 Hi Sean,
 
 I am currently testing rsockets in connection with ceph.
 I am using LD_PRELOAD and the librspreload.so to force
 the application (ceph) to use rsockets instead of regular
 tcp/ip sockets.
 
 All this works pretty well - until the point where an
 established connection is shut down: this seems to not
 work and never finishes (unless the application is killed...).
 
 The way ceph uses sockets is in a nonblocking mode.
 When reading from a socket, it polls the socket first
 with an event mask of POLLIN and POLLRDHUP.
 
 On the return from the poll() I see that
   - POLLIN and POLLHUP are set in the returned events
 (POLLRDHUP is *not* set)
   - errno is 95 (EOPNOTSUPP)
 
 (The POLLHUP makes me believe that in this case the other
 end has shutdown the socket already.)
 
 The EOPNOTSUPP confuses ceph quite a bit and prevents it
 from shutting down it's side of the socket connection properly.
 
 
 Question: is it possible that the POLLRDHUP causes the
 EOPNOTSUPP to be set by librspreload::poll() or
 rpoll()?
 
 Best Regards
 
 Andreas Bluemle
 
 
 
 



-- 
Andreas Bluemle mailto:andreas.blue...@itxperts.de
ITXperts GmbH   http://www.itxperts.de
Balanstrasse 73, Geb. 08Phone: (+49) 89 89044917
D-81541 Muenchen (Germany)  Fax:   (+49) 89 89044910

Company details: http://www.itxperts.de/imprint.htm
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


using rsockets via librspreload: poll() support?

2013-08-08 Thread Andreas Bluemle
Hi Sean,

I am currently testing rsockets in connection with ceph.
I am using LD_PRELOAD and the librspreload.so to force
the application (ceph) to use rsockets instead of regular
tcp/ip sockets.

All this works pretty well - until the point where an
established connection is shut down: this seems to not
work and never finishes (unless the application is killed...).

The way ceph uses sockets is in a nonblocking mode.
When reading from a socket, it polls the socket first
with an event mask of POLLIN and POLLRDHUP.

On the return from the poll() I see that
  - POLLIN and POLLHUP are set in the returned events
(POLLRDHUP is *not* set)
  - errno is 95 (EOPNOTSUPP)

(The POLLHUP makes me believe that in this case the other
end has shutdown the socket already.)

The EOPNOTSUPP confuses ceph quite a bit and prevents it
from shutting down it's side of the socket connection properly.


Question: is it possible that the POLLRDHUP causes the
EOPNOTSUPP to be set by librspreload::poll() or
rpoll()?

Best Regards

Andreas Bluemle




-- 
Andreas Bluemle mailto:andreas.blue...@itxperts.de
ITXperts GmbH   http://www.itxperts.de
Balanstrasse 73, Geb. 08Phone: (+49) 89 89044917
D-81541 Muenchen (Germany)  Fax:   (+49) 89 89044910

Company details: http://www.itxperts.de/imprint.htm
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html