Hi Jeff,

Missed to add this:
SSL_pending was 0 before calling SSL_readand hence SSL_get_errorreturned 'SSL_ERROR_WANT_READ'

Thanks,
Vijay


On Tuesday 24 June 2014 05:15 PM, Vijaikumar M wrote:
Hi Jeff,

This is regarding the patch http://review.gluster.org/#/c/3842/ (epoll: edge triggered and multi-threaded epoll). The testcase './tests/bugs/bug-873367.t' hangs with this fix (Please find the stack trace below).

In the code snippet below we found that 'SSL_pending' was returning 0.
I have added a condition here to return from the function when there is no data available. Please suggest if this is OK to do this way or do we need to restructure this function for multi-threaded epoll?

<code: socket.c>
 178 static int
179 ssl_do (rpc_transport_t *this, void *buf, size_t len, SSL_trinary_func *func)
 180 {
 ....

 211                 switch (SSL_get_error(priv->ssl_ssl,r)) {
 212                 case SSL_ERROR_NONE:
 213                         return r;
 214                 case SSL_ERROR_WANT_READ:
 215                         if (SSL_pending(priv->ssl_ssl) == 0)
 216                                 return r;
 217                         pfd.fd = priv->sock;
 221                         if (poll(&pfd,1,-1) < 0) {
</code>



Thanks,
Vijay

On Tuesday 24 June 2014 03:55 PM, Vijaikumar M wrote:
From the stack trace we found that function 'socket_submit_request' is waiting on mutext_lock. lock is held by the function 'ssl_do' and this function is blocked by poll syscall.


(gdb) bt
#0  0x0000003daa80822d in pthread_join () from /lib64/libpthread.so.0
#1 0x00007f3b94eea9d0 in event_dispatch_epoll (event_pool=<value optimized out>) at event-epoll.c:632 #2 0x0000000000407ecd in main (argc=4, argv=0x7fff160a4528) at glusterfsd.c:2023


(gdb) info threads
10 Thread 0x7f3b8d483700 (LWP 26225) 0x0000003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0 9 Thread 0x7f3b8ca82700 (LWP 26226) 0x0000003daa80f4b5 in sigwait () from /lib64/libpthread.so.0 8 Thread 0x7f3b8c081700 (LWP 26227) 0x0000003daa80b98e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
7 Thread 0x7f3b8b680700 (LWP 26228) 0x0000003daa80b98e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
6 Thread 0x7f3b8a854700 (LWP 26232) 0x0000003daa4e9163 in epoll_wait () from /lib64/libc.so.6 5 Thread 0x7f3b89e53700 (LWP 26233) 0x0000003daa4e9163 in epoll_wait () from /lib64/libc.so.6 4 Thread 0x7f3b833eb700 (LWP 26241) 0x0000003daa4df343 in poll () from /lib64/libc.so.6 3 Thread 0x7f3b82130700 (LWP 26245) 0x0000003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0 2 Thread 0x7f3b8172f700 (LWP 26247) 0x0000003daa80e75d in read () from /lib64/libpthread.so.0 * 1 Thread 0x7f3b94a38700 (LWP 26224) 0x0000003daa80822d in pthread_join () from /lib64/libpthread.so.0


*(gdb) thread 3**
**[Switching to thread 3 (Thread 0x7f3b82130700 (LWP 26245))]#0 0x0000003daa80e264 in __lll_lock_wait ()**
**   from /lib64/libpthread.so.0**
**(gdb) bt
#0  0x0000003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003daa809508 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x0000003daa8093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f3b8aa74524 in socket_submit_request (this=0x7f3b7c0505c0, req=0x7f3b8212f0b0) at socket.c:3134 *#4 0x00007f3b94c6b7d5 in rpc_clnt_submit (rpc=0x7f3b7c029ce0, prog=<value optimized out>, procnum=<value optimized out>, cbkfn=0x7f3b892364b0 <client3_3_lookup_cbk>, proghdr=0x7f3b8212f410, proghdrcount=1, progpayload=0x0, progpayloadcount=0, iobref=<value optimized out>, frame=0x7f3b93d2a454, rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x7f3b700010d0)
    at rpc-clnt.c:1556
#5 0x00007f3b892243b0 in client_submit_request (this=0x7f3b7c005ef0, req=<value optimized out>, frame=0x7f3b93d2a454, prog=0x7f3b894525a0, procnum=27, cbkfn=0x7f3b892364b0 <client3_3_lookup_cbk>, iobref=0x0, rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x7f3b700010d0,
    xdrproc=0x7f3b94a4ede0 <xdr_gfs3_lookup_req>) at client.c:243
#6 0x00007f3b8922fa42 in client3_3_lookup (frame=0x7f3b93d2a454, this=0x7f3b7c005ef0, data=0x7f3b8212f660)
    at client-rpc-fops.c:3119


(gdb) p priv->lock
$1 = {__data = {__lock = 2, __count = 0, __owner = 26241, __nusers = 1, __kind = 0, __spins = 0, __list = {
      __prev = 0x0, __next = 0x0}},
__size = "\002\000\000\000\000\000\000\000\201f\000\000\001", '\000' <repeats 26 times>, __align = 2}


*(gdb) thread 4
[Switching to thread 4 (Thread 0x7f3b833eb700 (LWP 26241))]#0 0x0000003daa4df343 in poll () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003daa4df343 in poll () from /lib64/libc.so.6
#1 0x00007f3b8aa71fff in ssl_do (this=0x7f3b7c0505c0, buf=0x7f3b7c051264, len=4, func=0x3db2441570 <SSL_read>)
    at socket.c:216
#2 0x00007f3b8aa7277b in __socket_ssl_readv (this=<value optimized out>, opvector=<value optimized out>,
    opcount=<value optimized out>) at socket.c:335
#3 0x00007f3b8aa72c26 in __socket_cached_read (this=<value optimized out>, vector=<value optimized out>, count=<value optimized out>, pending_vector=0x7f3b7c051258, pending_count=0x7f3b7c051260, bytes=0x0, write=0)
    at socket.c:422
#4 __socket_rwv (this=<value optimized out>, vector=<value optimized out>, count=<value optimized out>, pending_vector=0x7f3b7c051258, pending_count=0x7f3b7c051260, bytes=0x0, write=0) at socket.c:496 #5 0x00007f3b8aa76040 in __socket_readv (this=0x7f3b7c0505c0) at socket.c:589
#6  __socket_proto_state_machine (this=0x7f3b7c0505c0) at socket.c:1966
#7  socket_proto_state_machine (this=0x7f3b7c0505c0) at socket.c:2106
#8  socket_event_poll_in (this=0x7f3b7c0505c0) at socket.c:2127
#9 0x00007f3b8aa77820 in socket_poller (ctx=0x7f3b7c0505c0) at socket.c:2338
#10 0x0000003daa8079d1 in start_thread () from /lib64/libpthread.so.0
#11 0x0000003daa4e8b6d in clone () from /lib64/libc.so.6
*

Thanks,
Vijay


On Tuesday 24 June 2014 08:59 AM, Raghavendra Gowdappa wrote:
ok. Sorry, I didn't look into change #. I'll sync up with Vijay.

----- Original Message -----
From: "Anand Avati"<av...@redhat.com>
To: "Raghavendra Gowdappa"<rgowd...@redhat.com>
Cc:vmall...@redhat.com
Sent: Tuesday, June 24, 2014 8:55:34 AM
Subject: Re: Change in glusterfs[master]: epoll: Handle client and server FDs 
in a separate event pool

On 6/23/14, 8:00 PM, Raghavendra Gowdappa wrote:
----- Original Message -----
From: "Raghavendra Gowdappa"<rgowd...@redhat.com>
To: "Anand Avati"<av...@redhat.com>
Cc:vmall...@redhat.com
Sent: Tuesday, June 24, 2014 8:28:41 AM
Subject: Re: Change in glusterfs[master]: epoll: Handle client and server
FDs in a separate event pool



----- Original Message -----
From: "Anand Avati"<av...@redhat.com>
To:vmall...@redhat.com
Cc: "Raghavendra G"<rgowd...@redhat.com>
Sent: Monday, June 23, 2014 10:07:19 PM
Subject: Re: Change in glusterfs[master]: epoll: Handle client and server
FDs in a separate event pool

On 6/22/14, 8:47 PM, Vijaikumar Mallikarjuna (Code Review) wrote:
Vijaikumar Mallikarjuna has posted comments on this change.

Change subject: epoll: Handle client and server FDs in a separate event
pool
......................................................................


Patch Set 9:

Hi Avati,

Actually we started working on the fix for Bug# 1096729 which was a
blocker
issue.
We tried multiple ways not to change the current epoll model for now,
however we had to do some changes in the epoll code and ended with this
patch.


MT patch# 3842 looks good to me. It will be great you can help us
getting
the patch in quickly.

Thanks,
Vijay

Copying Raghavendra as he's the RPC guy. Du - #3842 is blocked in review
for a long time because of some incompatibility with RPC SSL mode. Very
likely some issue in our SSL multi-threading code. Can you help Vijai
debug this and move #3842 forward? Also there are new SSL patches from
Jeff upstream. Can you guys check if the new patches fix this problem?
Sure, I'll try to sync up with Vijay.
However, I've a doubt on the approach we've to take. Doesn't your patch on
multithreaded epoll also fix this issue? Given that yours is a generic
solution, shouldn't it be favoured over this solution?

that's precisely what i meant.. #3824 (the more generic MT epoll) is
having some issues with SSL MT code (otherwise it is working fine)




_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Reply via email to