On Thu, 27 Jul 2017 at 16:48, Paolo Margara <paolo.marg...@polito.it> wrote:
> Hi Atin, > > in attachment all the requested logs. > > Considering that I'm using gluster as a storage system for oVirt I've > checked also these logs and I've seen that almost every commands on all the > three nodes are executed by the supervdsm daemon and not only by the SPM > node. Could be this the root cause of this problem? > Indeed. > > Greetings, > > Paolo > > PS: could you suggest a better method than attachment for sharing log > files? > > Il 26/07/2017 15:28, Atin Mukherjee ha scritto: > > Technically if only one node is pumping all these status commands, you > shouldn't get into this situation. Can you please help me with the latest > cmd_history & glusterd log files from all the nodes? > > On Wed, Jul 26, 2017 at 1:41 PM, Paolo Margara <paolo.marg...@polito.it> > wrote: > >> Hi Atin, >> >> I've initially disabled gluster status check on all nodes except on one >> on my nagios instance as you recommended but this issue happens again. >> >> So I've disabled it on each nodes but the error happens again, currently >> only oVirt is monitoring gluster. >> >> I cannot modify this behaviour in the oVirt GUI, there is anything that >> could I do from the gluster prospective to solve this issue? Considering >> that 3.8 is near EOL also upgrading to 3.10 could be an option. >> >> >> Greetings, >> >> Paolo >> >> Il 20/07/2017 15:37, Paolo Margara ha scritto: >> >> OK, on my nagios instance I've disabled gluster status check on all nodes >> except on one, I'll check if this is enough. >> >> Thanks, >> >> Paolo >> >> Il 20/07/2017 13:50, Atin Mukherjee ha scritto: >> >> So from the cmd_history.logs across all the nodes it's evident that >> multiple commands on the same volume are run simultaneously which can >> result into transactions collision and you can end up with one command >> succeeding and others failing. Ideally if you are running volume status >> command for monitoring it's suggested to be run from only one node. >> >> On Thu, Jul 20, 2017 at 3:54 PM, Paolo Margara <paolo.marg...@polito.it> >> wrote: >> >>> In attachment the requested logs for all the three nodes. >>> >>> thanks, >>> >>> Paolo >>> >>> Il 20/07/2017 11:38, Atin Mukherjee ha scritto: >>> >>> Please share the cmd_history.log file from all the storage nodes. >>> >>> On Thu, Jul 20, 2017 at 2:34 PM, Paolo Margara <paolo.marg...@polito.it> >>> wrote: >>> >>>> Hi list, >>>> >>>> recently I've noted a strange behaviour of my gluster storage, >>>> sometimes while executing a simple command like "gluster volume status >>>> vm-images-repo" as a response I got "Another transaction is in progress for >>>> vm-images-repo. Please try again after sometime.". This situation does not >>>> get solved simply waiting for but I've to restart glusterd on the node that >>>> hold (and does not release) the lock, this situation occur randomly after >>>> some days. In the meanwhile, prior and after the issue appear, everything >>>> is working as expected. >>>> >>>> I'm using gluster 3.8.12 on CentOS 7.3, the only relevant information >>>> that I found on the log file (etc-glusterfs-glusterd.vol.log) of my three >>>> nodes are the following: >>>> >>>> * node1, at the moment the issue begins: >>>> >>>> [2017-07-19 15:07:43.130203] W >>>> [glusterd-locks.c:572:glusterd_mgmt_v3_lock] >>>> (-->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0x3a00f) >>>> [0x7f373f25f00f] >>>> -->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0x2ba25) >>>> [0x7f373f250a25] >>>> -->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0xd048f) >>>> [0x7f373f2f548f] ) 0-management: Lock for vm-images-repo held by >>>> 2c6f154f-efe3-4479-addc-b2021aa9d5df >>>> [2017-07-19 15:07:43.128242] I [MSGID: 106499] >>>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: >>>> Received status volume req for volume vm-images-repo >>>> [2017-07-19 15:07:43.130244] E [MSGID: 106119] >>>> [glusterd-op-sm.c:3782:glusterd_op_ac_lock] 0-management: Unable to acquire >>>> lock for vm-images-repo >>>> [2017-07-19 15:07:43.130320] E [MSGID: 106376] >>>> [glusterd-op-sm.c:7775:glusterd_op_sm] 0-management: handler returned: -1 >>>> [2017-07-19 15:07:43.130665] E [MSGID: 106116] >>>> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Locking >>>> failed on virtnode-0-1-gluster. Please check log file for details. >>>> [2017-07-19 15:07:43.131293] E [MSGID: 106116] >>>> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Locking >>>> failed on virtnode-0-2-gluster. Please check log file for details. >>>> [2017-07-19 15:07:43.131360] E [MSGID: 106151] >>>> [glusterd-syncop.c:1884:gd_sync_task_begin] 0-management: Locking Peers >>>> Failed. >>>> [2017-07-19 15:07:43.132005] E [MSGID: 106116] >>>> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Unlocking >>>> failed on virtnode-0-2-gluster. Please check log file for details. >>>> [2017-07-19 15:07:43.132182] E [MSGID: 106116] >>>> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Unlocking >>>> failed on virtnode-0-1-gluster. Please check log file for details. >>>> >>>> * node2, at the moment the issue begins: >>>> >>>> [2017-07-19 15:07:43.131975] W >>>> [glusterd-locks.c:572:glusterd_mgmt_v3_lock] >>>> (-->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0x3a00f) >>>> [0x7f17b5b9e00f] >>>> -->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0x2ba25) >>>> [0x7f17b5b8fa25] >>>> -->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0xd048f) >>>> [0x7f17b5c3448f] ) 0-management: Lock for vm-images-repo held by >>>> d9047ecd-26b5-467b-8e91-50f76a0c4d16 >>>> [2017-07-19 15:07:43.132019] E [MSGID: 106119] >>>> [glusterd-op-sm.c:3782:glusterd_op_ac_lock] 0-management: Unable to acquire >>>> lock for vm-images-repo >>>> [2017-07-19 15:07:43.133568] W >>>> [glusterd-locks.c:686:glusterd_mgmt_v3_unlock] >>>> (-->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0x3a00f) >>>> [0x7f17b5b9e00f] >>>> -->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0x2b712) >>>> [0x7f17b5b8f712] >>>> -->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0xd082a) >>>> [0x7f17b5c3482a] ) 0-management: Lock owner mismatch. Lock for vol >>>> vm-images-repo held by d9047ecd-26b5-467b-8e91-50f76a0c4d16 >>>> [2017-07-19 15:07:43.133597] E [MSGID: 106118] >>>> [glusterd-op-sm.c:3845:glusterd_op_ac_unlock] 0-management: Unable to >>>> release lock for vm-images-repo >>>> The message "E [MSGID: 106376] [glusterd-op-sm.c:7775:glusterd_op_sm] >>>> 0-management: handler returned: -1" repeated 3 times between [2017-07-19 >>>> 15:07:42.976193] and [2017-07-19 15:07:43.133646] >>>> >>>> * node3, at the moment the issue begins: >>>> >>>> [2017-07-19 15:07:42.976593] I [MSGID: 106499] >>>> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: >>>> Received status volume req for volume vm-images-repo >>>> [2017-07-19 15:07:43.129941] W >>>> [glusterd-locks.c:572:glusterd_mgmt_v3_lock] >>>> (-->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0x3a00f) >>>> [0x7f6133f5b00f] >>>> -->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0x2ba25) >>>> [0x7f6133f4ca25] >>>> -->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0xd048f) >>>> [0x7f6133ff148f] ) 0-management: Lock for vm-images-repo held by >>>> d9047ecd-26b5-467b-8e91-50f76a0c4d16 >>>> [2017-07-19 15:07:43.129981] E [MSGID: 106119] >>>> [glusterd-op-sm.c:3782:glusterd_op_ac_lock] 0-management: Unable to acquire >>>> lock for vm-images-repo >>>> [2017-07-19 15:07:43.130034] E [MSGID: 106376] >>>> [glusterd-op-sm.c:7775:glusterd_op_sm] 0-management: handler returned: -1 >>>> [2017-07-19 15:07:43.130131] E [MSGID: 106275] >>>> [glusterd-rpc-ops.c:876:glusterd_mgmt_v3_lock_peers_cbk_fn] 0-management: >>>> Received mgmt_v3 lock RJT from uuid: 2c6f154f-efe3-4479-addc-b2021aa9d5df >>>> [2017-07-19 15:07:43.130710] W >>>> [glusterd-locks.c:686:glusterd_mgmt_v3_unlock] >>>> (-->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0x3a00f) >>>> [0x7f6133f5b00f] >>>> -->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0x2b712) >>>> [0x7f6133f4c712] >>>> -->/usr/lib64/glusterfs/3.8.12/xlator/mgmt/glusterd.so(+0xd082a) >>>> [0x7f6133ff182a] ) 0-management: Lock owner mismatch. Lock for vol >>>> vm-images-repo held by d9047ecd-26b5-467b-8e91-50f76a0c4d16 >>>> [2017-07-19 15:07:43.130733] E [MSGID: 106118] >>>> [glusterd-op-sm.c:3845:glusterd_op_ac_unlock] 0-management: Unable to >>>> release lock for vm-images-repo >>>> [2017-07-19 15:07:43.130771] E [MSGID: 106376] >>>> [glusterd-op-sm.c:7775:glusterd_op_sm] 0-management: handler returned: -1 >>>> >>>> The thing that is really strange is that in this case the uuid of node3 >>>> is d9047ecd-26b5-467b-8e91-50f76a0c4d16! >>>> >>>> The mapping nodename-uuid is: >>>> >>>> * (node1) virtnode-0-0-gluster: 2c6f154f-efe3-4479-addc-b2021aa9d5df >>>> >>>> * (node2) virtnode-0-1-gluster: e93ebee7-5d95-4100-a9df-4a3e60134b73 >>>> >>>> * (node3) virtnode-0-2-gluster: d9047ecd-26b5-467b-8e91-50f76a0c4d16 >>>> >>>> In this case restarting glusterd on node3 usually solve the issue. >>>> >>>> What could be the root cause of this behavior? How can I fix this once >>>> and for all? >>>> >>>> If needed I could provide the full log file. >>>> >>>> >>>> Greetings, >>>> >>>> Paolo Margara >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users@gluster.org >>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>> >> >> _______________________________________________ >> Gluster-users mailing >> listGluster-users@gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users >> >> -- - Atin (atinm)
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users