Re: [Gluster-users] Locking failed - since upgrade to 3.6.4

2015-08-03 Thread Osborne, Paul (paul.osbo...@canterbury.ac.uk)
Hi,


 [2015-08-03 14:51:57.791081] E [glusterd-utils.c:148:glusterd_lock] 
 0-management: Unable to get lock for uuid: 
 76e4398c-e00a-4f3b-9206-4f885c4e5206, lock held by: 
 76e4398c-e00a-4f3b-9206-4f885c4e5206



 This indicates that cluster is still operating at older op version. You would 
 need to bump up the op version to 30604 using Gluster

 volume set all cluster.op-version 30604


Hmm, it would be helpful if that were in the upgrade documentation in a 
location that is obvious.


Anyhow:


# gluster volume set all cluster.op-version 30604
volume set: failed: Required op_version (30604) is not supported


Not so good.


dpkg --list | grep glus
ii  glusterfs-client   3.6.4-1   amd64  
  clustered file-system (client package)
ii  glusterfs-common   3.6.4-1   amd64  
  GlusterFS common libraries and translator modules
ii  glusterfs-server   3.6.4-1   amd64  
  clustered file-system (server package)


So tried on the basis of 
http://www.gluster.org/pipermail/gluster-users/2014-November/019666.html

:


Rgfse-rh-01:/var/log/glusterfs# gluster volume set all cluster.op-version 30600
volume set: success
Rgfse-rh-01:/var/log/glusterfs# gluster volume set all cluster.op-version 30601
volume set: success
gfse-rh-01:/var/log/glusterfs# gluster volume set all cluster.op-version 30602
volume set: success
Rgfse-rh-01:/var/log/glusterfs# gluster volume set all cluster.op-version 30603
volume set: success
Rgfse-rh-01:/var/log/glusterfs# gluster volume set all cluster.op-version 30604
volume set: failed: Required op_version (30604) is not supported



Which I guess is closer to where I want to be...

Will see if that does what I need - even if not quite right...

Thanks

Paul

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Locking failed - since upgrade to 3.6.4

2015-08-03 Thread Atin Mukherjee
Could you check the glusterd log at the other nodes, that would give you
the hint of the exact issue. Also looking at .cmd_log_history will give you
the time interval at which volume status commands are executed. If the gap
is in milisecs then you are bound to hit it and its expected.

-Atin
Sent from one plus one
On Aug 3, 2015 7:32 PM, Osborne, Paul (paul.osbo...@canterbury.ac.uk) 
paul.osbo...@canterbury.ac.uk wrote:


 Hi,

 Last week I upgraded one of my gluster clusters (3 hosts with bricks as
 replica 3) to 3.6.4 from 3.5.4 and all seemed well.

 Today I am getting reports that locking has failed:


 gfse-cant-01:/var/log/glusterfs# gluster volume status
 Locking failed on gfse-rh-01.core.canterbury.ac.uk. Please check log file
 for details.
 Locking failed on gfse-isr-01.core.canterbury.ac.uk. Please check log
 file for details.

 Logs:
 [2015-08-03 13:45:29.974560] E [glusterd-syncop.c:1640:gd_sync_task_begin]
 0-management: Locking Peers Failed.
 [2015-08-03 13:49:48.273159] E [glusterd-syncop.c:105:gd_collate_errors]
 0-: Locking failed on gfse-rh-01.core.canterbury.ac.uk. Please ch
 eck log file for details.
 [2015-08-03 13:49:48.273778] E [glusterd-syncop.c:105:gd_collate_errors]
 0-: Locking failed on gfse-isr-01.core.canterbury.ac.uk. Please c
 heck log file for details.


 I am wondering if this is a new feature due to 3.6.4 or something that has
 gone wrong.

 Restarting gluster entirely (btw the restart script does not actually
 appear to kill the processes...) resolves the issue but then it repeats a
 few minutes later which is rather suboptimal for a running service.

 Googling suggests that there may be simultaneous actions going on that can
 cause a locking issue.

 I know that I have nagios running volume status volname for each of my
 volumes on each host every few minutes however this is not new and has been
 in place for the last 8-9 months that against 3.5 without issue so would
 hope that this is not causing the issue.

 I am not sure where to look now tbh.




 Paul Osborne
 Senior Systems Engineer
 Canterbury Christ Church University
 Tel: 01227 782751
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Locking failed - since upgrade to 3.6.4

2015-08-03 Thread Osborne, Paul (paul.osbo...@canterbury.ac.uk)
Hi,


OK I have tracked through the logs which of the hosts apparently has a lock 
open:


[2015-08-03 14:55:37.602717] I 
[glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management: 
Received status volume req for volume blogs

[2015-08-03 14:51:57.791081] E [glusterd-utils.c:148:glusterd_lock] 
0-management: Unable to get lock for uuid: 
76e4398c-e00a-4f3b-9206-4f885c4e5206, lock held by: 
76e4398c-e00a-4f3b-9206-4f885c4e5206


I have identified the UID for each peer via gluster peer status and working 
backwards.

I see that gluster volume clear-locks may the locks on the volume - but is not 
clear from the logs is what the path is that has the lock or the kind that is 
locked.

Incidentally my clients (using NFS) through manual testing appear to still be 
able to read/write to the volume - it is the volume status and heal checks that 
are failing. All of my clients and servers have been sequentially rebooted in 
the hope that this would clear any issue - however that doe not appear to be 
the case.



Thanks

Paul




Paul Osborne
Senior Systems Engineer
Canterbury Christ Church University
Tel: 01227 782751



From: Atin Mukherjee atin.mukherje...@gmail.com
Sent: 03 August 2015 15:22
To: Osborne, Paul (paul.osbo...@canterbury.ac.uk)
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Locking failed - since upgrade to 3.6.4


Could you check the glusterd log at the other nodes, that would give you the 
hint of the exact issue. Also looking at .cmd_log_history will give you the 
time interval at which volume status commands are executed. If the gap is in 
milisecs then you are bound to hit it and its expected.

-Atin
Sent from one plus one

On Aug 3, 2015 7:32 PM, Osborne, Paul 
(paul.osbo...@canterbury.ac.ukmailto:paul.osbo...@canterbury.ac.uk) 
paul.osbo...@canterbury.ac.ukmailto:paul.osbo...@canterbury.ac.uk wrote:

Hi,

Last week I upgraded one of my gluster clusters (3 hosts with bricks as replica 
3) to 3.6.4 from 3.5.4 and all seemed well.

Today I am getting reports that locking has failed:


gfse-cant-01:/var/log/glusterfs# gluster volume status
Locking failed on 
gfse-rh-01.core.canterbury.ac.ukhttp://gfse-rh-01.core.canterbury.ac.uk. 
Please check log file for details.
Locking failed on 
gfse-isr-01.core.canterbury.ac.ukhttp://gfse-isr-01.core.canterbury.ac.uk. 
Please check log file for details.

Logs:
[2015-08-03 13:45:29.974560] E [glusterd-syncop.c:1640:gd_sync_task_begin] 
0-management: Locking Peers Failed.
[2015-08-03 13:49:48.273159] E [glusterd-syncop.c:105:gd_collate_errors] 0-: 
Locking failed on 
gfse-rh-01.core.canterbury.ac.ukhttp://gfse-rh-01.core.canterbury.ac.uk. 
Please ch
eck log file for details.
[2015-08-03 13:49:48.273778] E [glusterd-syncop.c:105:gd_collate_errors] 0-: 
Locking failed on 
gfse-isr-01.core.canterbury.ac.ukhttp://gfse-isr-01.core.canterbury.ac.uk. 
Please c
heck log file for details.


I am wondering if this is a new feature due to 3.6.4 or something that has gone 
wrong.

Restarting gluster entirely (btw the restart script does not actually appear to 
kill the processes...) resolves the issue but then it repeats a few minutes 
later which is rather suboptimal for a running service.

Googling suggests that there may be simultaneous actions going on that can 
cause a locking issue.

I know that I have nagios running volume status volname for each of my 
volumes on each host every few minutes however this is not new and has been in 
place for the last 8-9 months that against 3.5 without issue so would hope that 
this is not causing the issue.

I am not sure where to look now tbh.




Paul Osborne
Senior Systems Engineer
Canterbury Christ Church University
Tel: 01227 782751
___
Gluster-users mailing list
Gluster-users@gluster.orgmailto:Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Locking failed - since upgrade to 3.6.4

2015-08-03 Thread Atin Mukherjee
-Atin
Sent from one plus one
On Aug 3, 2015 8:31 PM, Osborne, Paul (paul.osbo...@canterbury.ac.uk) 
paul.osbo...@canterbury.ac.uk wrote:

 Hi,


 OK I have tracked through the logs which of the hosts apparently has a
lock open:


 [2015-08-03 14:55:37.602717] I
[glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume blogs

 [2015-08-03 14:51:57.791081] E [glusterd-utils.c:148:glusterd_lock]
0-management: Unable to get lock for uuid:
76e4398c-e00a-4f3b-9206-4f885c4e5206, lock held by:
76e4398c-e00a-4f3b-9206-4f885c4e5206

This indicates that cluster is still operating at older op version. You
would need to bump up the op version to 30604 using Gluster volume set all
cluster.op-version 30604

 I have identified the UID for each peer via gluster peer status and
working backwards.

 I see that gluster volume clear-locks may the locks on the volume - but
is not clear from the logs is what the path is that has the lock or the
kind that is locked.

 Incidentally my clients (using NFS) through manual testing appear to
still be able to read/write to the volume - it is the volume status and
heal checks that are failing. All of my clients and servers have been
sequentially rebooted in the hope that this would clear any issue - however
that doe not appear to be the case.



 Thanks

 Paul




 Paul Osborne
 Senior Systems Engineer
 Canterbury Christ Church University
 Tel: 01227 782751


 
 From: Atin Mukherjee atin.mukherje...@gmail.com
 Sent: 03 August 2015 15:22
 To: Osborne, Paul (paul.osbo...@canterbury.ac.uk)
 Cc: gluster-users@gluster.org
 Subject: Re: [Gluster-users] Locking failed - since upgrade to 3.6.4


 Could you check the glusterd log at the other nodes, that would give you
the hint of the exact issue. Also looking at .cmd_log_history will give you
the time interval at which volume status commands are executed. If the gap
is in milisecs then you are bound to hit it and its expected.

 -Atin
 Sent from one plus one

 On Aug 3, 2015 7:32 PM, Osborne, Paul (paul.osbo...@canterbury.ac.uk) 
paul.osbo...@canterbury.ac.uk wrote:


 Hi,

 Last week I upgraded one of my gluster clusters (3 hosts with bricks as
replica 3) to 3.6.4 from 3.5.4 and all seemed well.

 Today I am getting reports that locking has failed:


 gfse-cant-01:/var/log/glusterfs# gluster volume status
 Locking failed on gfse-rh-01.core.canterbury.ac.uk. Please check log
file for details.
 Locking failed on gfse-isr-01.core.canterbury.ac.uk. Please check log
file for details.

 Logs:
 [2015-08-03 13:45:29.974560] E
[glusterd-syncop.c:1640:gd_sync_task_begin] 0-management: Locking Peers
Failed.
 [2015-08-03 13:49:48.273159] E [glusterd-syncop.c:105:gd_collate_errors]
0-: Locking failed on gfse-rh-01.core.canterbury.ac.uk. Please ch
 eck log file for details.
 [2015-08-03 13:49:48.273778] E [glusterd-syncop.c:105:gd_collate_errors]
0-: Locking failed on gfse-isr-01.core.canterbury.ac.uk. Please c
 heck log file for details.


 I am wondering if this is a new feature due to 3.6.4 or something that
has gone wrong.

 Restarting gluster entirely (btw the restart script does not actually
appear to kill the processes...) resolves the issue but then it repeats a
few minutes later which is rather suboptimal for a running service.

 Googling suggests that there may be simultaneous actions going on that
can cause a locking issue.

 I know that I have nagios running volume status volname for each of my
volumes on each host every few minutes however this is not new and has been
in place for the last 8-9 months that against 3.5 without issue so would
hope that this is not causing the issue.

 I am not sure where to look now tbh.




 Paul Osborne
 Senior Systems Engineer
 Canterbury Christ Church University
 Tel: 01227 782751
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users