Hi,

Last week I upgraded one of my gluster clusters (3 hosts with bricks as replica 
3) to 3.6.4 from 3.5.4 and all seemed well.

Today I am getting reports that locking has failed:


gfse-cant-01:/var/log/glusterfs# gluster volume status
Locking failed on gfse-rh-01.core.canterbury.ac.uk. Please check log file for 
details.
Locking failed on gfse-isr-01.core.canterbury.ac.uk. Please check log file for 
details.

Logs:
[2015-08-03 13:45:29.974560] E [glusterd-syncop.c:1640:gd_sync_task_begin] 
0-management: Locking Peers Failed.
[2015-08-03 13:49:48.273159] E [glusterd-syncop.c:105:gd_collate_errors] 0-: 
Locking failed on gfse-rh-01.core.canterbury.ac.uk. Please ch
eck log file for details.
[2015-08-03 13:49:48.273778] E [glusterd-syncop.c:105:gd_collate_errors] 0-: 
Locking failed on gfse-isr-01.core.canterbury.ac.uk. Please c
heck log file for details.


I am wondering if this is a new feature due to 3.6.4 or something that has gone 
wrong.

Restarting gluster entirely (btw the restart script does not actually appear to 
kill the processes...) resolves the issue but then it repeats a few minutes 
later which is rather suboptimal for a running service.

Googling suggests that there may be simultaneous actions going on that can 
cause a locking issue.

I know that I have nagios running volume status <volname> for each of my 
volumes on each host every few minutes however this is not new and has been in 
place for the last 8-9 months that against 3.5 without issue so would hope that 
this is not causing the issue.

I am not sure where to look now tbh.




Paul Osborne
Senior Systems Engineer
Canterbury Christ Church University
Tel: 01227 782751
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Reply via email to