Hi,
In our two node system if one node fails, the other node takes over the
application and uses the shared gfs2 target successfully. However, after the
failed node comes back any attempts to lock files on the gfs2 resource results
in -ENOSYS. The following test program exhibits the problem - in normal
operation the lock succeeds but in the fail/recover scenario we get -ENOSYS:
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int
main(int argc, char **argv)
{
int fd;
struct flock fl;
fd = open("/mnt/test.file",O_RDONLY);
if (fd != -1) {
if (fcntl(fd, F_SETFL, O_RDONLY|O_DSYNC) != -1) {
fl.l_type = F_RDLCK;
fl.l_whence = SEEK_SET;
fl.l_start = 0;
fl.l_len = 0;
if (fcntl(fd, F_SETLK, &fl) != -1)
printf("File locked successfully\n");
else
perror("fcntl(F_SETLK)");
} else
perror("fcntl(F_SETFL)");
close (fd);
} else
perror("open");
}
I've tracked things down to these messages:
1409631951 lockspace lvclusdidiz0360 plock disabled our sig 816fba01 nodeid 2
sig 2f6b
:
1409634840 lockspace lvclusdidiz0360 plock disabled our sig 0 nodeid 2 sig 2f6b
Which indicates the lockspace attribute disable_plock has been set by way of
the other node calling send_plocks_stored
().
Looking at the cpg.c:
static void prepare_plocks(struct lockspace *ls)
{
struct change *cg = list_first_entry(&ls->changes, struct change, list);
struct member *memb;
uint32_t sig;
:
:
:
if (nodes_added(ls))
store_plocks(ls, &sig);
send_plocks_stored(ls, sig);
}
If nodes_added(ls) returns false then an uninitialized "sig" value will be
passed to send_plocks_stored(). Do the "our sig" and "sig" values in the above
log messages make sense?
If this is not the case, what is supposed to happen in order re-enable plocks
on the recovered node?
Neale
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster