Can you please capture output of "pstack $(pidof glusterd)" and send it to us? We need to capture this information when glusterd is struck.
On Mon, Apr 8, 2019 at 8:05 AM Zhou, Cynthia (NSB - CN/Hangzhou) < cynthia.z...@nokia-sbell.com> wrote: > Hi glusterfs experts, > > Good day! > > In my test env, sometimes glusterd stuck issue happened, and it is not > responding to any gluster commands, when I checked this issue I find that > glusterd thread 9 and thread 8 is dealing with the same socket, I thought > following patch should be able to solve this issue, however after I merged > this patch this issue still exist. When I looked into this code, it seems > socket_event_poll_in called event_handled before > rpc_transport_pollin_destroy, I think this gives the chance for another > poll for the exactly the same socket. And caused this glusterd stuck issue, > also, I find there is no LOCK_DESTROY(&iobref->lock) > > In iobref_destroy, I think it is better to add destroy lock. > > Following is the gdb info when this issue happened, I would like to know > your opinion on this issue, thanks! > > > > SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537 > > > > ** socket: fix issue on concurrent handle of a socket* > > > > > > > > *GDB INFO:* > > Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in > iobref_unref, I think > > Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)): > > #0 0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0 > > #1 0x00007f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0 > > #2 0x00007f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at > iobuf.c:944 > > #3 0x00007f9eeafd2f29 in rpc_transport_pollin_destroy > (pollin=0x7f9ed00452d0) at rpc-transport.c:123 > > #4 0x00007f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, > notify_handled=_gf_true) at socket.c:2322 > > #5 0x00007f9ee4fbf932 in socket_event_handler (*fd=36, idx=27, gen=4, > data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0*) at socket.c:2471 > > #6 0x00007f9eeb2825d4 in event_dispatch_epoll_handler > (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583 > > #7 0x00007f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at > event-epoll.c:659 > > #8 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0 > > #9 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6 > > > > Thread 8 (Thread 0x7f9edffff700 (LWP 1932)): > > #0 0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0 > > #1 0x00007f9ee9fd2b42 in __pthread_mutex_cond_lock () from > /lib64/libpthread.so.0 > > #2 0x00007f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > > #3 0x00007f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0, > gen=4, idx=27) at socket.c:1201 > > #4 0x00007f9ee4fbf99c in socket_event_handler (*fd=36, idx=27, gen=4, > data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0*) at socket.c:2480 > > #5 0x00007f9eeb2825d4 in event_dispatch_epoll_handler > (event_pool=0x17feb00, event=0x7f9edfffee84) at event-epoll.c:583 > > #6 0x00007f9eeb2828ab in event_dispatch_epoll_worker (data=0x180cf20) at > event-epoll.c:659 > > #7 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0 > > #8 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6 > > > > (gdb) thread 9 > > [Switching to thread 9 (Thread 0x7f9edf7fe700 (LWP 1933))] > > #0 0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0 > > (gdb) bt > > #0 0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0 > > #1 0x00007f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0 > > #2 0x00007f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at > iobuf.c:944 > > #3 0x00007f9eeafd2f29 in rpc_transport_pollin_destroy > (pollin=0x7f9ed00452d0) at rpc-transport.c:123 > > #4 0x00007f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, > notify_handled=_gf_true) at socket.c:2322 > > #5 0x00007f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, > data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471 > > #6 0x00007f9eeb2825d4 in event_dispatch_epoll_handler > (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583 > > #7 0x00007f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at > event-epoll.c:659 > > #8 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0 > > #9 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6 > > (gdb) frame 2 > > #2 0x00007f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at > iobuf.c:944 > > 944 iobuf.c: No such file or directory. > > (gdb) print *iobref > > $1 = {lock = {spinlock = 2, mutex = {__data = {__lock = 2, __count = 222, > __owner = -2120437760, __nusers = 1, __kind = 8960, __spins = 512, > > __elision = 0, __list = {__prev = 0x4000, __next = > 0x7f9ed00063b000}}, > > __size = > "\002\000\000\000\336\000\000\000\000\260\234\201\001\000\000\000\000#\000\000\000\002\000\000\000@\000\000\000\000\000\000\000\260c\000О\177", > __align = 953482739714}}, ref = -256, iobrefs = 0xffffffffffffffff, alloced > = -1, used = -1} > > (gdb) quit > > A debugging session is active. > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel -- Thanks, Sanju
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel