Re: [Gluster-users] not to reconnect between client and server because of race condition
Hi Kaushal, It is great. This patch could fix my issue. Thanks, Xin At 2016-11-25 14:57:56, "Kaushal M"wrote: >On Fri, Nov 25, 2016 at 12:03 PM, songxin wrote: >> Hi Atin >> I found a problem, that is about client(glusterfs) will not trying to >> reconnect to server(glusterfsd) after disconnect. >> Actually, it seems caused by race condition. >> >> >> Precondition >> >> The glusterfs version is 3.7.6. >> I create a replicate volume using two node, A node and B node.One brick is >> on A node and another brick is on B node. >> A node ip:10.32.1.144 >> B node ip:10.32.0.48 >> >> >> The phenomenon is following. >> >> Firstly, the client(glusterfs) on A board disconnect with server(glusterfsd) >> on B board.The log is following. >> ... >> readv on 10.32.0.48:49309 failed (No data available) >> ... >> >> And then the client(glusterfs) on A board disconnect with server(glusterfsd) >> on A board.The log is following. >> ... >> readv on 10.32.1.144:49391 failed (Connection reset by peer) >> ... >> >> After that, all operation in mount point will show "Transport endpoint is >> not connected" until client reconnect with server(glusterfsd) on B board. >> >> >> The client log is following.And I have highlight the important line. >> ... >> [2016-10-31 04:06:03.626047] W [socket.c:588:__socket_rwv] >> 0-c_glusterfs-client-9: readv on 10.32.1.144:49391 failed (Connection reset >> by peer) >> [2016-10-31 04:06:03.627345] E [rpc-clnt.c:362:saved_frames_unwind] (--> >> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] ( >> --> >> /usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] ( >> --> >> /usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] ( >> --> >> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18] >> ( >> --> >> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] ) >> >> 0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3) >> >> op(FINODELK(30)) called at 2016-10-31 04:06:03.626033 (xid=0x7f5e) >> >> [2016-10-31 04:06:03.627395] E [MSGID: 114031] >> [client-rpc-fops.c:1673:client3_3_finodelk_cbk] 0-c_glusterfs-client-9: >> remote operation failed [Transport endpoint is not connected] >> >> [2016-10-31 04:06:03.628381] I [socket.c:3308:socket_submit_request] >> 0-c_glusterfs-client-9: not connected (priv->connected = 0) >> >> [2016-10-31 04:06:03.628432] W [rpc-clnt.c:1586:rpc_clnt_submit] >> 0-c_glusterfs-client-9: failed to submit rpc-request (XID: 0x7f5f Program: >> GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport >> (c_glusterfs-client-9) >> >> [2016-10-31 04:06:03.628466] E [MSGID: 114031] >> [client-rpc-fops.c:1673:client3_3_finodelk_cbk] 0-c_glusterfs-client-9: >> remote operation failed [Transport endpoint is not connected] >> [2016-10-31 04:06:03.628475] I [MSGID: 108019] >> [afr-lk-common.c:1086:afr_lock_blocking] 0-c_glusterfs-replicate-0: unable >> to lock on even one child >> >> [2016-10-31 04:06:03.628539] I [MSGID: 108019] >> [afr-transaction.c:1224:afr_post_blocking_inodelk_cbk] >> 0-c_glusterfs-replicate-0: Blocking inodelks failed. >> >> [2016-10-31 04:06:03.628630] W [fuse-bridge.c:1282:fuse_err_cbk] >> 0-glusterfs-fuse: 20790: FLUSH() ERR => -1 (Transport endpoint is not >> connected) >> [2016-10-31 04:06:03.629149] E [rpc-clnt.c:362:saved_frames_unwind] (--> >> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] (--> >> /usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] (--> >> /usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] (--> >> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18] >> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] ) >> 0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3) >> op(LOOKUP(27)) called at 2016-10-31 04:06:03.624346 (xid=0x7f5a) >> >> [2016-10-31 04:06:03.629183] I [rpc-clnt.c:1847:rpc_clnt_reconfig] >> 0-c_glusterfs-client-9: changing port to 49391 (from 0) >> >> [2016-10-31 04:06:03.629210] W [MSGID: 114031] >> [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-c_glusterfs-client-9: remote >> operation failed. Path: >> /loadmodules_norepl/CXC1725605_P93A001/cello/emasviews >> (b0e5a94e-a432-4dce-b86f-a551555780a2) [Transport endpoint is not connected] >> [2016-10-31 04:06:03.629266] I [socket.c:3308:socket_submit_request] >> 0-c_glusterfs-client-9: not connected (priv->connected = 255) >> [2016-10-31 04:06:03.629277] I [MSGID: 109063] >> [dht-layout.c:702:dht_layout_normalize] 0-c_glusterfs-dht: Found anomalies >> in /loadmodules_norepl/CXC1725605_P93A001/cello/emasviews (gfid = >>
Re: [Gluster-users] not to reconnect between client and server because of race condition
Hi Kaushal, Thank you for your reply. I will make sure whether this patch could fix my problem. Thanks, Xin At 2016-11-25 14:57:56, "Kaushal M"wrote: >On Fri, Nov 25, 2016 at 12:03 PM, songxin wrote: >> Hi Atin >> I found a problem, that is about client(glusterfs) will not trying to >> reconnect to server(glusterfsd) after disconnect. >> Actually, it seems caused by race condition. >> >> >> Precondition >> >> The glusterfs version is 3.7.6. >> I create a replicate volume using two node, A node and B node.One brick is >> on A node and another brick is on B node. >> A node ip:10.32.1.144 >> B node ip:10.32.0.48 >> >> >> The phenomenon is following. >> >> Firstly, the client(glusterfs) on A board disconnect with server(glusterfsd) >> on B board.The log is following. >> ... >> readv on 10.32.0.48:49309 failed (No data available) >> ... >> >> And then the client(glusterfs) on A board disconnect with server(glusterfsd) >> on A board.The log is following. >> ... >> readv on 10.32.1.144:49391 failed (Connection reset by peer) >> ... >> >> After that, all operation in mount point will show "Transport endpoint is >> not connected" until client reconnect with server(glusterfsd) on B board. >> >> >> The client log is following.And I have highlight the important line. >> ... >> [2016-10-31 04:06:03.626047] W [socket.c:588:__socket_rwv] >> 0-c_glusterfs-client-9: readv on 10.32.1.144:49391 failed (Connection reset >> by peer) >> [2016-10-31 04:06:03.627345] E [rpc-clnt.c:362:saved_frames_unwind] (--> >> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] ( >> --> >> /usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] ( >> --> >> /usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] ( >> --> >> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18] >> ( >> --> >> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] ) >> >> 0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3) >> >> op(FINODELK(30)) called at 2016-10-31 04:06:03.626033 (xid=0x7f5e) >> >> [2016-10-31 04:06:03.627395] E [MSGID: 114031] >> [client-rpc-fops.c:1673:client3_3_finodelk_cbk] 0-c_glusterfs-client-9: >> remote operation failed [Transport endpoint is not connected] >> >> [2016-10-31 04:06:03.628381] I [socket.c:3308:socket_submit_request] >> 0-c_glusterfs-client-9: not connected (priv->connected = 0) >> >> [2016-10-31 04:06:03.628432] W [rpc-clnt.c:1586:rpc_clnt_submit] >> 0-c_glusterfs-client-9: failed to submit rpc-request (XID: 0x7f5f Program: >> GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport >> (c_glusterfs-client-9) >> >> [2016-10-31 04:06:03.628466] E [MSGID: 114031] >> [client-rpc-fops.c:1673:client3_3_finodelk_cbk] 0-c_glusterfs-client-9: >> remote operation failed [Transport endpoint is not connected] >> [2016-10-31 04:06:03.628475] I [MSGID: 108019] >> [afr-lk-common.c:1086:afr_lock_blocking] 0-c_glusterfs-replicate-0: unable >> to lock on even one child >> >> [2016-10-31 04:06:03.628539] I [MSGID: 108019] >> [afr-transaction.c:1224:afr_post_blocking_inodelk_cbk] >> 0-c_glusterfs-replicate-0: Blocking inodelks failed. >> >> [2016-10-31 04:06:03.628630] W [fuse-bridge.c:1282:fuse_err_cbk] >> 0-glusterfs-fuse: 20790: FLUSH() ERR => -1 (Transport endpoint is not >> connected) >> [2016-10-31 04:06:03.629149] E [rpc-clnt.c:362:saved_frames_unwind] (--> >> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] (--> >> /usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] (--> >> /usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] (--> >> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18] >> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] ) >> 0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3) >> op(LOOKUP(27)) called at 2016-10-31 04:06:03.624346 (xid=0x7f5a) >> >> [2016-10-31 04:06:03.629183] I [rpc-clnt.c:1847:rpc_clnt_reconfig] >> 0-c_glusterfs-client-9: changing port to 49391 (from 0) >> >> [2016-10-31 04:06:03.629210] W [MSGID: 114031] >> [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-c_glusterfs-client-9: remote >> operation failed. Path: >> /loadmodules_norepl/CXC1725605_P93A001/cello/emasviews >> (b0e5a94e-a432-4dce-b86f-a551555780a2) [Transport endpoint is not connected] >> [2016-10-31 04:06:03.629266] I [socket.c:3308:socket_submit_request] >> 0-c_glusterfs-client-9: not connected (priv->connected = 255) >> [2016-10-31 04:06:03.629277] I [MSGID: 109063] >> [dht-layout.c:702:dht_layout_normalize] 0-c_glusterfs-dht: Found anomalies >> in
Re: [Gluster-users] not to reconnect between client and server because of race condition
On Fri, Nov 25, 2016 at 12:03 PM, songxinwrote: > Hi Atin > I found a problem, that is about client(glusterfs) will not trying to > reconnect to server(glusterfsd) after disconnect. > Actually, it seems caused by race condition. > > > Precondition > > The glusterfs version is 3.7.6. > I create a replicate volume using two node, A node and B node.One brick is > on A node and another brick is on B node. > A node ip:10.32.1.144 > B node ip:10.32.0.48 > > > The phenomenon is following. > > Firstly, the client(glusterfs) on A board disconnect with server(glusterfsd) > on B board.The log is following. > ... > readv on 10.32.0.48:49309 failed (No data available) > ... > > And then the client(glusterfs) on A board disconnect with server(glusterfsd) > on A board.The log is following. > ... > readv on 10.32.1.144:49391 failed (Connection reset by peer) > ... > > After that, all operation in mount point will show "Transport endpoint is > not connected" until client reconnect with server(glusterfsd) on B board. > > > The client log is following.And I have highlight the important line. > ... > [2016-10-31 04:06:03.626047] W [socket.c:588:__socket_rwv] > 0-c_glusterfs-client-9: readv on 10.32.1.144:49391 failed (Connection reset > by peer) > [2016-10-31 04:06:03.627345] E [rpc-clnt.c:362:saved_frames_unwind] (--> > /usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] ( > --> > /usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] ( > --> > /usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] ( > --> > /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18] > ( > --> > /usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] ) > > 0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3) > > op(FINODELK(30)) called at 2016-10-31 04:06:03.626033 (xid=0x7f5e) > > [2016-10-31 04:06:03.627395] E [MSGID: 114031] > [client-rpc-fops.c:1673:client3_3_finodelk_cbk] 0-c_glusterfs-client-9: > remote operation failed [Transport endpoint is not connected] > > [2016-10-31 04:06:03.628381] I [socket.c:3308:socket_submit_request] > 0-c_glusterfs-client-9: not connected (priv->connected = 0) > > [2016-10-31 04:06:03.628432] W [rpc-clnt.c:1586:rpc_clnt_submit] > 0-c_glusterfs-client-9: failed to submit rpc-request (XID: 0x7f5f Program: > GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport > (c_glusterfs-client-9) > > [2016-10-31 04:06:03.628466] E [MSGID: 114031] > [client-rpc-fops.c:1673:client3_3_finodelk_cbk] 0-c_glusterfs-client-9: > remote operation failed [Transport endpoint is not connected] > [2016-10-31 04:06:03.628475] I [MSGID: 108019] > [afr-lk-common.c:1086:afr_lock_blocking] 0-c_glusterfs-replicate-0: unable > to lock on even one child > > [2016-10-31 04:06:03.628539] I [MSGID: 108019] > [afr-transaction.c:1224:afr_post_blocking_inodelk_cbk] > 0-c_glusterfs-replicate-0: Blocking inodelks failed. > > [2016-10-31 04:06:03.628630] W [fuse-bridge.c:1282:fuse_err_cbk] > 0-glusterfs-fuse: 20790: FLUSH() ERR => -1 (Transport endpoint is not > connected) > [2016-10-31 04:06:03.629149] E [rpc-clnt.c:362:saved_frames_unwind] (--> > /usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] (--> > /usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] (--> > /usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] (--> > /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18] > (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] ) > 0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3) > op(LOOKUP(27)) called at 2016-10-31 04:06:03.624346 (xid=0x7f5a) > > [2016-10-31 04:06:03.629183] I [rpc-clnt.c:1847:rpc_clnt_reconfig] > 0-c_glusterfs-client-9: changing port to 49391 (from 0) > > [2016-10-31 04:06:03.629210] W [MSGID: 114031] > [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-c_glusterfs-client-9: remote > operation failed. Path: > /loadmodules_norepl/CXC1725605_P93A001/cello/emasviews > (b0e5a94e-a432-4dce-b86f-a551555780a2) [Transport endpoint is not connected] > [2016-10-31 04:06:03.629266] I [socket.c:3308:socket_submit_request] > 0-c_glusterfs-client-9: not connected (priv->connected = 255) > [2016-10-31 04:06:03.629277] I [MSGID: 109063] > [dht-layout.c:702:dht_layout_normalize] 0-c_glusterfs-dht: Found anomalies > in /loadmodules_norepl/CXC1725605_P93A001/cello/emasviews (gfid = > b0e5a94e-a432-4dce-b86f-a551555780a2). Holes=1 overlaps=0 > [2016-10-31 04:06:03.629293] W [rpc-clnt.c:1586:rpc_clnt_submit] > 0-c_glusterfs-client-9: failed to submit rpc-request (XID: 0x7f62 Program: > GlusterFS 3.3, ProgVers: 330, Proc: 41) to
[Gluster-users] not to reconnect between client and server because of race condition
Hi Atin I found a problem, that is about client(glusterfs) will not trying to reconnect to server(glusterfsd) after disconnect. Actually, it seems caused by race condition. Precondition The glusterfs version is 3.7.6. I create a replicate volume using two node, A node and B node.One brick is on A node and another brick is on B node. A node ip:10.32.1.144 B node ip:10.32.0.48 The phenomenon is following. Firstly, the client(glusterfs) on A board disconnect with server(glusterfsd) on B board.The log is following. ... readv on 10.32.0.48:49309 failed (No data available) ... And then the client(glusterfs) on A board disconnect with server(glusterfsd) on A board.The log is following. ... readv on 10.32.1.144:49391 failed (Connection reset by peer) ... After that, all operation in mount point will show "Transport endpoint is not connected" until client reconnect with server(glusterfsd) on B board. The client log is following.And I have highlight the important line. ... [2016-10-31 04:06:03.626047] W [socket.c:588:__socket_rwv] 0-c_glusterfs-client-9: readv on 10.32.1.144:49391 failed (Connection reset by peer) [2016-10-31 04:06:03.627345] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] ( --> /usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] ( --> /usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] ( --> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18] ( --> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] ) 0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3) op(FINODELK(30)) called at 2016-10-31 04:06:03.626033 (xid=0x7f5e) [2016-10-31 04:06:03.627395] E [MSGID: 114031] [client-rpc-fops.c:1673:client3_3_finodelk_cbk] 0-c_glusterfs-client-9: remote operation failed [Transport endpoint is not connected] [2016-10-31 04:06:03.628381] I [socket.c:3308:socket_submit_request] 0-c_glusterfs-client-9: not connected (priv->connected = 0) [2016-10-31 04:06:03.628432] W [rpc-clnt.c:1586:rpc_clnt_submit] 0-c_glusterfs-client-9: failed to submit rpc-request (XID: 0x7f5f Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (c_glusterfs-client-9) [2016-10-31 04:06:03.628466] E [MSGID: 114031] [client-rpc-fops.c:1673:client3_3_finodelk_cbk] 0-c_glusterfs-client-9: remote operation failed [Transport endpoint is not connected] [2016-10-31 04:06:03.628475] I [MSGID: 108019] [afr-lk-common.c:1086:afr_lock_blocking] 0-c_glusterfs-replicate-0: unable to lock on even one child [2016-10-31 04:06:03.628539] I [MSGID: 108019] [afr-transaction.c:1224:afr_post_blocking_inodelk_cbk] 0-c_glusterfs-replicate-0: Blocking inodelks failed. [2016-10-31 04:06:03.628630] W [fuse-bridge.c:1282:fuse_err_cbk] 0-glusterfs-fuse: 20790: FLUSH() ERR => -1 (Transport endpoint is not connected) [2016-10-31 04:06:03.629149] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xb5c80)[0x3fff8ab79f58] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind-0x1b7a0)[0x3fff8ab1dc90] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy-0x1b638)[0x3fff8ab1de10] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup-0x19af8)[0x3fff8ab1fb18] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify-0x18e68)[0x3fff8ab20808] ) 0-c_glusterfs-client-9: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-10-31 04:06:03.624346 (xid=0x7f5a) [2016-10-31 04:06:03.629183] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 0-c_glusterfs-client-9: changing port to 49391 (from 0) [2016-10-31 04:06:03.629210] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-c_glusterfs-client-9: remote operation failed. Path: /loadmodules_norepl/CXC1725605_P93A001/cello/emasviews (b0e5a94e-a432-4dce-b86f-a551555780a2) [Transport endpoint is not connected] [2016-10-31 04:06:03.629266] I [socket.c:3308:socket_submit_request] 0-c_glusterfs-client-9: not connected (priv->connected = 255) [2016-10-31 04:06:03.629277] I [MSGID: 109063] [dht-layout.c:702:dht_layout_normalize] 0-c_glusterfs-dht: Found anomalies in /loadmodules_norepl/CXC1725605_P93A001/cello/emasviews (gfid = b0e5a94e-a432-4dce-b86f-a551555780a2). Holes=1 overlaps=0 [2016-10-31 04:06:03.629293] W [rpc-clnt.c:1586:rpc_clnt_submit] 0-c_glusterfs-client-9: failed to submit rpc-request (XID: 0x7f62 Program: GlusterFS 3.3, ProgVers: 330, Proc: 41) to rpc-transport (c_glusterfs-client-9) [2016-10-31 04:06:03.629333] W