@Nikhil Ladha <[email protected]> Please look at this issue. On Thu, Apr 23, 2020 at 9:02 PM <[email protected]> wrote:
> Did you find any clue in log files ? > > I can try an update to 7.5 in case some recent bug were solved, what's > your opinion ? > > ------------------------------ > *De: *"Sanju Rakonde" <[email protected]> > *À: *[email protected] > *Cc: *"gluster-users" <[email protected]> > *Envoyé: *Mercredi 22 Avril 2020 08:39:42 > *Objet: *Re: [Gluster-users] never ending logging > > Thanks for all the information. > For pstack output, gluster-debuginfo package has to be installed. I will > check out the provided information and get back to you. > > On Wed, Apr 22, 2020 at 11:54 AM <[email protected]> wrote: > >> I think all issues are linked with the same underlying problem. >> >> 1. all peers were in Connected state from every node yesterday but node 2 >> is "semi-connected" at now: >> root@glusterDevVM1:/var/log/glusterfs# gluster peer status >> Number of Peers: 2 >> >> Hostname: glusterDevVM3 >> Uuid: 0d8a3686-9e37-4ce7-87bf-c85d1ec40974 >> State: Peer in Cluster (Connected) >> >> Hostname: glusterDevVM2 >> Uuid: 7f6c3023-144b-4db2-9063-d90926dbdd18 >> State: Peer in Cluster (Connected) >> root@glusterDevVM2:~# gluster peer status >> Number of Peers: 2 >> >> Hostname: glusterDevVM1 >> Uuid: e2263e4d-a307-45d5-9cec-e1791f7a45fb >> State: Peer in Cluster (Disconnected) >> >> Hostname: glusterDevVM3 >> Uuid: 0d8a3686-9e37-4ce7-87bf-c85d1ec40974 >> State: Peer in Cluster (Connected) >> root@glusterDevVM3:~# gluster peer status >> Number of Peers: 2 >> >> Hostname: glusterDevVM2 >> Uuid: 7f6c3023-144b-4db2-9063-d90926dbdd18 >> State: Peer in Cluster (Connected) >> >> Hostname: glusterDevVM1 >> Uuid: e2263e4d-a307-45d5-9cec-e1791f7a45fb >> State: Peer in Cluster (Connected) >> 2, 3, 4. a simple gluster volume status show multiple errors on each node: >> root@glusterDevVM1:~# gluster volume status tmp >> Locking failed on glusterDevVM2. Please check log file for details. >> root@glusterDevVM2:~# gluster volume status tmp >> Another transaction is in progress for tmp. Please try again after some >> time. >> root@glusterDevVM3:~# gluster volume status tmp >> Error : Request timed out >> >> Logs for each node (except SSL errors): >> root@glusterDevVM1:~# egrep -v '0-socket.management' >> /var/log/glusterfs/glusterd.log >> [2020-04-22 05:38:32.278618] E [rpc-clnt.c:346:saved_frames_unwind] (--> >> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x138)[0x7fd28d99fda8] >> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcd97)[0x7fd28d745d97] (--> >> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcebe)[0x7fd28d745ebe] (--> >> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)[0x7fd28d746e93] >> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xea18)[0x7fd28d747a18] ))))) >> 0-management: forced unwinding frame type(Peer mgmt) op(--(4)) called at >> 2020-04-22 05:38:32.243087 (xid=0x8d) >> [2020-04-22 05:38:32.278638] E [MSGID: 106157] >> [glusterd-rpc-ops.c:665:__glusterd_friend_update_cbk] 0-management: RPC >> Error >> [2020-04-22 05:38:32.278651] I [MSGID: 106493] >> [glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management: >> Received RJT from uuid: 00000000-0000-0000-0000-000000000000 >> [2020-04-22 05:38:33.256401] I [MSGID: 106498] >> [glusterd-svc-helper.c:747:__glusterd_send_svc_configure_req] 0-management: >> not connected yet >> [2020-04-22 05:38:35.279149] I >> [socket.c:4347:ssl_setup_connection_params] 0-management: SSL support on >> the I/O path is ENABLED >> [2020-04-22 05:38:35.279169] I >> [socket.c:4350:ssl_setup_connection_params] 0-management: SSL support for >> glusterd is ENABLED >> [2020-04-22 05:38:35.279178] I >> [socket.c:4360:ssl_setup_connection_params] 0-management: using certificate >> depth 1 >> The message "I [MSGID: 106004] >> [glusterd-handler.c:6204:__glusterd_peer_rpc_notify] 0-management: Peer >> <glusterDevVM2> (<7f6c3023-144b-4db2-9063-d90926dbdd18>), in state <Peer in >> Cluster>, has disconnected from glusterd." repeated 3 times between >> [2020-04-22 05:38:25.232116] and [2020-04-22 05:38:35.667153] >> [2020-04-22 05:38:35.667255] W >> [glusterd-locks.c:796:glusterd_mgmt_v3_unlock] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x22119) >> [0x7fd287da7119] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x2caae) >> [0x7fd287db1aae] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0xdf0d3) >> [0x7fd287e640d3] ) 0-management: Lock for vol <vol> not held >> [2020-04-22 05:38:35.667275] W [MSGID: 106117] >> [glusterd-handler.c:6225:__glusterd_peer_rpc_notify] 0-management: Lock not >> released for <vol> >> 2 last lines repeated for each volume >> >> root@glusterDevVM2:~# egrep -v '0-socket.management' >> /var/log/glusterfs/glusterd.log >> [2020-04-22 05:51:57.493574] E [rpc-clnt.c:346:saved_frames_unwind] (--> >> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x138)[0x7f30411dbda8] >> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcd97)[0x7f3040f81d97] (--> >> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcebe)[0x7f3040f81ebe] (--> >> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)[0x7f3040f82e93] >> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xea18)[0x7f3040f83a18] ))))) >> 0-management: forced unwinding frame type(Gluster MGMT Handshake) >> op(MGMT-VERS(1)) called at 2020-04-22 05:51:57.483579 (xid=0x563) >> [2020-04-22 05:51:57.493623] E [MSGID: 106167] >> [glusterd-handshake.c:2040:__glusterd_mgmt_hndsk_version_cbk] 0-management: >> Error through RPC layer, retry again later >> [2020-04-22 05:52:00.501474] I >> [socket.c:4347:ssl_setup_connection_params] 0-management: SSL support on >> the I/O path is ENABLED >> [2020-04-22 05:52:00.501542] I >> [socket.c:4350:ssl_setup_connection_params] 0-management: SSL support for >> glusterd is ENABLED >> [2020-04-22 05:52:00.501569] I >> [socket.c:4360:ssl_setup_connection_params] 0-management: using certificate >> depth 1 >> [2020-04-22 05:52:00.983720] I [MSGID: 106004] >> [glusterd-handler.c:6204:__glusterd_peer_rpc_notify] 0-management: Peer >> <glusterDevVM1> (<e2263e4d-a307-45d5-9cec-e1791f7a45fb>), in state <Peer in >> Cluster>, has disconnected from glusterd. >> [2020-04-22 05:52:00.983886] W >> [glusterd-locks.c:796:glusterd_mgmt_v3_unlock] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x22119) >> [0x7f303b5e3119] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x2caae) >> [0x7f303b5edaae] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0xdf0d3) >> [0x7f303b6a00d3] ) 0-management: Lock for vol <vol> not held >> [2020-04-22 05:52:00.983909] W [MSGID: 106117] >> [glusterd-handler.c:6225:__glusterd_peer_rpc_notify] 0-management: Lock not >> released for <vol> >> 2 last lines repeated for each volume >> >> root@glusterDevVM3:~# egrep -v '0-socket.management' >> /var/log/glusterfs/glusterd.log >> [2020-04-22 05:38:33.229959] I [MSGID: 106499] >> [glusterd-handler.c:4264:__glusterd_handle_status_volume] 0-management: >> Received status volume req for volume tmp >> [2020-04-22 05:41:33.230170] I >> [glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: unlock >> timer is cancelled for volume_type tmp_vol >> [2020-04-22 05:48:34.908289] E [rpc-clnt.c:183:call_bail] 0-management: >> bailing out frame type(glusterd mgmt v3), op(--(1)), xid = 0x108, unique = >> 918, sent = 2020-04-22 05:38:33.230268, timeout = 600 for 10.5.1.7:24007 >> [2020-04-22 05:48:34.908339] E [MSGID: 106115] >> [glusterd-mgmt.c:117:gd_mgmt_v3_collate_errors] 0-management: Locking >> failed on glusterDevVM1. Please check log file for details. >> [2020-04-22 05:48:40.288539] E [rpc-clnt.c:183:call_bail] 0-management: >> bailing out frame type(glusterd mgmt v3), op(--(1)), xid = 0x27, unique = >> 917, sent = 2020-04-22 05:38:33.230258, timeout = 600 for 10.5.1.8:24007 >> [2020-04-22 05:48:40.288568] E [MSGID: 106115] >> [glusterd-mgmt.c:117:gd_mgmt_v3_collate_errors] 0-management: Locking >> failed on glusterDevVM2. Please check log file for details. >> [2020-04-22 05:48:40.288631] E [MSGID: 106150] >> [glusterd-syncop.c:1918:gd_sync_task_begin] 0-management: Locking Peers >> Failed. >> >> I'm not familiar with pstack, when running on node 3 (arbiter) I get only >> these few lines : >> root@glusterDevVM3:~# pstack 13700 >> >> 13700: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO >> (No symbols found) >> 0x7fafd747a6cd: ???? >> >> Which debian stretch package should I install ? >> >> To be more explicit, I stopped glusterd on all 3 nodes then restart >> sequentially with this order : node1, node3 (arbiter) then node2. >> Log files can be dl at >> https://www.dropbox.com/s/rcgcw7jrud2wkj1/glusterd-logs.tar.bz2?dl=0 >> >> Thanks for your help. >> >> ------------------------------ >> *De: *"Sanju Rakonde" <[email protected]> >> *À: *[email protected] >> *Cc: *"gluster-users" <[email protected]> >> *Envoyé: *Mercredi 22 Avril 2020 07:23:52 >> *Objet: *Re: [Gluster-users] never ending logging >> >> Hi, >> The email is talking about many issues. Let me ask a few questions to get >> a whole picture. >> 1. are the peers are in the connected state now? or they still in the >> rejected state? >> 2. What led you to see "locking failed" messages? We would like to if >> there is a reproducer and fix the issue if any. >> 3. Another transaction in progress message appears when there is already >> a operation going on. Are you seeing this when there is no such transaction >> going on? >> 4. When did you hit the timedouts? Did you tried to look at the pstack >> output of glusterd process? If so, please share the pstack output. >> >> On Tue, Apr 21, 2020 at 7:08 PM <[email protected]> wrote: >> >>> Hi all. >>> >>> We're using 3 nodes Gluster 7.3 (2 + 1 arbiter), yesterday node 2 was >>> rejected from cluster and I applied following steps to fix : >>> https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Resolving%20Peer%20Rejected/ >>> I also saw >>> https://docs.gluster.org/en/latest/Troubleshooting/troubleshooting-glusterd/ >>> but solution isn't compatible as cluster.max-op-version doesn't exist and >>> all op-version are the same on all 3 nodes. >>> >>> After renewing SSL certs and several restart all volumes came back >>> online but glusterd log file on all 3 nodes is filled with nothing else >>> than following 3 lines : >>> >>> [2020-04-21 13:05:19.478913] I >>> [socket.c:4347:ssl_setup_connection_params] 0-socket.management: SSL >>> support on the I/O path is ENABLED >>> [2020-04-21 13:05:19.478972] I >>> [socket.c:4350:ssl_setup_connection_params] 0-socket.management: SSL >>> support for glusterd is ENABLED >>> [2020-04-21 13:05:19.478986] I >>> [socket.c:4360:ssl_setup_connection_params] 0-socket.management: using >>> certificate depth 1 >>> >>> Moreover, I have "Locking failed", "Another transaction is in progress" >>> and "Error : Request timed out" on gluster volume status volxxx command. >>> All SSL certs on clients have also been renewed and all volumes were >>> remounted. All 3 nodes were alternatively restarted (glusterd) and rebooted. >>> >>> The cluster is not in production environment but there's about ~250 >>> clients for ~75 volumes, I don't know how to troubleshoot and fix this >>> problem, if anyone has an idea. >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://bluejeans.com/441850968 >>> >>> Gluster-users mailing list >>> [email protected] >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >> >> -- >> Thanks, >> Sanju >> >> > > -- > Thanks, > Sanju > > -- Thanks, Sanju
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
