Re: [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
Hi Bruce, On Jan 24, 2008, at 10:00 AM, J. Bruce Fields wrote: On Thu, Jan 17, 2008 at 03:23:42PM -0500, J. Bruce Fields wrote: To summarize a phone conversation from today: On Thu, Jan 17, 2008 at 01:07:02PM -0500, Wendy Cheng wrote: J. Bruce Fields wrote: Would there be any advantage to enforcing that requirement in the server? (For example, teaching nlm to reject any locking request for a certain filesystem that wasn't sent to a certain server IP.) --b. It is doable... could be added into the resume patch that is currently being tested (since the logic is so similar to the per-ip base grace period) that should be out for review no later than next Monday. However, as any new code added into the system, there are trade- off(s). I'm not sure we want to keep enhancing this too much though. Sure. And I don't want to make this terribly complicated. The patch looks good, and solves a clear problem. That said, there are a few related problems we'd like to solve: - We want to be able to move an export to a node with an already active nfs server. Currently that requires restarting all of nfsd on the target node. This is what I understand your next patch fixes. - In the case of a filesystem that may be mounted from multiple nodes at once, we need to make sure we're not leaving a window allowing other applications to claim locks that nfs clients haven't recovered yet. - Ideally we'd like this to be possible without making the filesystem block all lock requests during a 90-second grace period; instead it should only have to block those requests that conflict with to-be-recovered locks. - All this should work for nfsv4, where we want to eventually also allow migration of individual clients, and client-initiated failover. I absolutely don't want to delay solving this particular problem until all the above is figured out, but I would like to be reasonably confident that the new user-interface can be extended naturally to handle the above cases; or at least that it won't unnecessarily complicate their implementation. I'll try to sketch an implementation of most of the above in the next week. Bah. Apologies, this is taking me longer than it should to figure out--I've only barely started writing patches. The basic idea, though: In practice, it seems that both the unlock_ip and unlock_pathname methods that revoke locks are going to be called together. The two separate calls therefore seem a little redundant. The reason we *need* both is that it's possible that a misconfigured client could grab locks for a (server ip, export) combination that it isn't supposed to. So it makes sense to me to restrict locking from the beginning to prevent that from happening. Therefore I'd like to add a call at the beginning like: echo 192.168.1.1 /exports/example /proc/fs/nfsd/start_grace before any exports are set up, which both starts a grace period, and tells nfs to allow locks on the filesystem /exports/example only if they're addressed to the server ip 192.168.1.1. Then on shutdown, echo 192.168.1.1 /proc/fs/nfsd/unlock_ip should be sufficient to guarantee that nfsd/lockd no longer holds locks on /exports/example. (I think Wendy's pretty close to that api already after adding the second method to start grace?) The other advantage to having the server-ip from the start is that at the time we make lock requests to the cluster filesystem, we can tell it that the locks associated with 192.168.1.1 are special: they may migrate as a group to another node, and on node failure they should (if possible) be held to give a chance for another node to take them over. Internally I'd like to have an object like struct lock_manager { char *lm_name; ... } for each server ip address. A pointer to this structure would be passed with each lock request, allowing the filesystem to associate locks to lock_manager's. The name would be a string derived from the server ip address that the cluster can compare to match reclaim requests with the locks that they're reclaiming from another node. (And in the NFSv4 case we would eventually also allow lock_managers with single nfsv4 client (as opposed to server-ip) granularity.) Does that seem sane? It does. Though, I'd like to elaborate on effect of this change on the disk filesystem, and in particular on a cluster filesystem. I know, I'm jumping ahead, but I'd like to make sure that it's all going to work well with cluster filesystems. As part of processing unlock by ip request (from above example of writing into /proc/fs/nfsd/unlock_ip) nfsd would call the underlying filesystem. In cluster filesystem we really can't just delete locks, as filesystem is still available and accessible from other nodes in the cluster. We need to protect
[Cluster-devel] cluster/gnbd-kernel/src gnbd.c
CVSROOT:/cvs/cluster Module name:cluster Changes by: [EMAIL PROTECTED] 2008-01-28 06:13:15 Modified files: gnbd-kernel/src: gnbd.c Log message: Update gnbd kernel modules to build with 2.6.24 Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gnbd-kernel/src/gnbd.c.diff?cvsroot=clusterr1=1.20r2=1.21 --- cluster/gnbd-kernel/src/gnbd.c 2007/06/19 22:31:56 1.20 +++ cluster/gnbd-kernel/src/gnbd.c 2008/01/28 06:13:15 1.21 @@ -168,7 +168,7 @@ if (down_trylock(dev-do_it_lock)) return -EBUSY; - res = sscanf(buf, %Lu\n, size); + res = sscanf(buf, %Lu\n, (unsigned long long int *)size); if (res != 1){ up(dev-do_it_lock); return -EINVAL; @@ -267,7 +267,7 @@ static void gnbd_end_request(struct request *req) { int uptodate = (req-errors == 0) ? 1 : 0; - request_queue_t *q = req-q; + struct request_queue *q = req-q; unsigned long flags; dprintk(DBG_BLKDEV, %s: request %p: %s\n, req-rq_disk-disk_name, @@ -369,7 +369,7 @@ int __gnbd_send_req(struct gnbd_device *dev, struct socket *sock, struct request *req, int can_signal) { - int result, i, flags; + int result, flags; struct gnbd_request request; unsigned long size = req-nr_sectors 9; @@ -403,28 +403,26 @@ } if (gnbd_cmd(req) == GNBD_CMD_WRITE) { - struct bio *bio; + struct req_iterator iter; + struct bio_vec *bvec; /* * we are really probing at internals to determine * whether to set MSG_MORE or not... */ - rq_for_each_bio(bio, req) { - struct bio_vec *bvec; - bio_for_each_segment(bvec, bio, i) { - flags = 0; - if ((i (bio-bi_vcnt - 1)) || bio-bi_next) - flags = MSG_MORE; - dprintk(DBG_TX, %s: request %p: sending %d bytes data\n, - dev-disk-disk_name, req, - bvec-bv_len); - result = sock_send_bvec(sock, bvec, flags, - can_signal); - if (result 0) { - printk(KERN_ERR %s: Send data failed (result %d)\n, - dev-disk-disk_name, - result); - goto error_out; - } + rq_for_each_segment(bvec, req, iter) { + flags = 0; + if (!rq_iter_last(req, iter)) + flags = MSG_MORE; + dprintk(DBG_TX, %s: request %p: sending %d bytes data\n, + dev-disk-disk_name, req, + bvec-bv_len); + result = sock_send_bvec(sock, bvec, flags, + can_signal); + if (result 0) { + printk(KERN_ERR %s: Send data failed (result %d)\n, + dev-disk-disk_name, + result); + goto error_out; } } } @@ -464,21 +462,19 @@ int gnbd_recv_req(struct gnbd_device *dev, struct request *req) { int result; - int i; - struct bio *bio; - rq_for_each_bio(bio, req) { - struct bio_vec *bvec; - bio_for_each_segment(bvec, bio, i) { - result = sock_recv_bvec(dev-sock, bvec); - if (result 0) { - printk(KERN_ERR %s: Receive data failed (result %d)\n, - dev-disk-disk_name, - result); - return result; - } - dprintk(DBG_RX, %s: request %p: got %d bytes data\n, - dev-disk-disk_name, req, bvec-bv_len); + struct bio_vec *bvec; + struct req_iterator iter; + + rq_for_each_segment(bvec, req, iter) { + result = sock_recv_bvec(dev-sock, bvec); + if (result 0) { + printk(KERN_ERR %s: Receive data failed (result %d)\n, + dev-disk-disk_name, + result); + return result; } + dprintk(DBG_RX, %s: request
[Cluster-devel] cluster/gfs-kernel/src/gfs ops_file.c
CVSROOT:/cvs/cluster Module name:cluster Changes by: [EMAIL PROTECTED] 2008-01-28 06:36:18 Modified files: gfs-kernel/src/gfs: ops_file.c Log message: fix gfs for the removal of sendfile and helper functions Sendfile and helper functions have been removed in 2.6.24. Migrate to using splice_read and generic_file_splice_read helper function. Signed-off-by: Phillip Lougher [EMAIL PROTECTED] Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/gfs/ops_file.c.diff?cvsroot=clusterr1=1.37r2=1.38 --- cluster/gfs-kernel/src/gfs/ops_file.c 2008/01/24 20:54:31 1.37 +++ cluster/gfs-kernel/src/gfs/ops_file.c 2008/01/28 06:36:18 1.38 @@ -1633,13 +1633,12 @@ return gfs_lm_plock(sdp, name, file, cmd, fl); } -#if 0 /** - * gfs_sendfile - Send bytes to a file or socket + * gfs_splice_read - Send bytes to a file or socket * @in_file: The file to read from * @out_file: The file to write to * @count: The amount of data - * @offset: The beginning file offset + * @ppos: The beginning file offset * * Outputs: offset - updated according to number of bytes read * @@ -1647,7 +1646,7 @@ */ static ssize_t -gfs_sendfile(struct file *in_file, loff_t *offset, size_t count, read_actor_t actor, void __user *target) +gfs_splice_read(struct file *in_file, loff_t *ppos, struct pipe_inode_info *pipe, size_t count, unsigned int flags) { struct gfs_inode *ip = get_v2ip(in_file-f_mapping-host); struct gfs_holder gh; @@ -1664,7 +1663,7 @@ if (gfs_is_jdata(ip)) retval = -ENOSYS; else - retval = generic_file_sendfile(in_file, offset, count, actor, target); + retval = generic_file_splice_read(in_file, ppos, pipe, count, flags); gfs_glock_dq(gh); @@ -1673,7 +1672,6 @@ return retval; } -#endif /** * do_flock - Acquire a flock on a file @@ -1802,7 +1800,7 @@ .release = gfs_close, .fsync = gfs_fsync, .lock = gfs_lock, -/* .sendfile = gfs_sendfile, */ +.splice_read = gfs_splice_read, .flock = gfs_flock, };
[Cluster-devel] cluster/gfs-kernel/src/gfs glock.c
CVSROOT:/cvs/cluster Module name:cluster Changes by: [EMAIL PROTECTED] 2008-01-28 06:40:25 Modified files: gfs-kernel/src/gfs: glock.c Log message: Remove unused variable Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/gfs/glock.c.diff?cvsroot=clusterr1=1.36r2=1.37 --- cluster/gfs-kernel/src/gfs/glock.c 2008/01/24 20:42:00 1.36 +++ cluster/gfs-kernel/src/gfs/glock.c 2008/01/28 06:40:25 1.37 @@ -1617,7 +1617,6 @@ struct gfs_glock *gl = gh-gh_gl; struct gfs_sbd *sdp = gl-gl_sbd; struct gfs_glock_operations *glops = gl-gl_ops; - struct list_head *pos; atomic_inc(gl-gl_sbd-sd_glock_dq_calls);
[Cluster-devel] cluster/gfs-kernel/src/gfs ioctl.c
CVSROOT:/cvs/cluster Module name:cluster Changes by: [EMAIL PROTECTED] 2008-01-28 06:42:35 Modified files: gfs-kernel/src/gfs: ioctl.c Log message: Fix build warning Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/gfs/ioctl.c.diff?cvsroot=clusterr1=1.18r2=1.19 --- cluster/gfs-kernel/src/gfs/ioctl.c 2008/01/24 20:54:31 1.18 +++ cluster/gfs-kernel/src/gfs/ioctl.c 2008/01/28 06:42:35 1.19 @@ -1490,7 +1490,7 @@ else if (strcmp(arg0, get_file_meta) == 0) error = gi_get_file_meta(ip, gi); else if (strcmp(arg0, get_file_meta_quota) == 0) - error = gi_get_file_meta(ip-i_sbd-sd_qinode, gi); + error = gi_get_file_meta(ip-i_sbd-sd_qinode, gi); else if (strcmp(arg0, do_file_flush) == 0) error = gi_do_file_flush(ip, gi); else if (strcmp(arg0, get_hfile_stat) == 0)
[Cluster-devel] cluster configure
CVSROOT:/cvs/cluster Module name:cluster Changes by: [EMAIL PROTECTED] 2008-01-28 06:43:50 Modified files: . : configure Log message: Bump kernel check to 2.6.24 Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/configure.diff?cvsroot=clusterr1=1.47r2=1.48 --- cluster/configure 2008/01/07 18:58:29 1.47 +++ cluster/configure 2008/01/28 06:43:50 1.48 @@ -27,7 +27,7 @@ # this should be only the major version without the extra version # eg. only the first 3 digits -my $required_kernelversion = '2.6.23'; +my $required_kernelversion = '2.6.24'; my %options = ( help = \$help,
[Cluster-devel] cluster/gfs-kernel/src/gfs ops_export.c ops_ex ...
CVSROOT:/cvs/cluster Module name:cluster Changes by: [EMAIL PROTECTED] 2008-01-28 06:29:25 Modified files: gfs-kernel/src/gfs: ops_export.c ops_export.h ops_vm.c sys.c Log message: Update gfs to cope with 2.6.24 export op changes and other bits Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/gfs/ops_export.c.diff?cvsroot=clusterr1=1.12r2=1.13 http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/gfs/ops_export.h.diff?cvsroot=clusterr1=1.2r2=1.3 http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/gfs/ops_vm.c.diff?cvsroot=clusterr1=1.8r2=1.9 http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/gfs/sys.c.diff?cvsroot=clusterr1=1.4r2=1.5 --- cluster/gfs-kernel/src/gfs/ops_export.c 2007/07/23 16:41:46 1.12 +++ cluster/gfs-kernel/src/gfs/ops_export.c 2008/01/28 06:29:25 1.13 @@ -44,49 +44,6 @@ }; /** - * gfs_decode_fh - - * @param1: description - * @param2: description - * @param3: description - * - * Function description - * - * Returns: what is returned - */ - -struct dentry * -gfs_decode_fh(struct super_block *sb, __u32 *fh, int fh_len, int fh_type, - int (*acceptable)(void *context, struct dentry *dentry), - void *context) -{ - struct inode_cookie this, parent; - - atomic_inc(get_v2sdp(sb)-sd_ops_export); - - memset(parent, 0, sizeof(struct inode_cookie)); - - switch (fh_type) { - case 6: - parent.gen_valid = TRUE; - parent.gen = gfs32_to_cpu(fh[5]); - case 5: - parent.formal_ino = ((uint64_t)gfs32_to_cpu(fh[3])) 32; - parent.formal_ino |= (uint64_t)gfs32_to_cpu(fh[4]); - case 3: - this.gen_valid = TRUE; - this.gen = gfs32_to_cpu(fh[2]); - this.formal_ino = ((uint64_t)gfs32_to_cpu(fh[0])) 32; - this.formal_ino |= (uint64_t)gfs32_to_cpu(fh[1]); - break; - default: - return NULL; - } - - return gfs_export_ops.find_exported_dentry(sb, this, parent, - acceptable, context); -} - -/** * gfs_encode_fh - * @param1: description * @param2: description @@ -290,10 +247,9 @@ */ struct dentry * -gfs_get_dentry(struct super_block *sb, void *inump) +gfs_get_dentry(struct super_block *sb, struct inode_cookie *cookie) { struct gfs_sbd *sdp = get_v2sdp(sb); - struct inode_cookie *cookie = (struct inode_cookie *)inump; struct gfs_inum inum; struct gfs_holder i_gh, ri_gh, rgd_gh; struct gfs_rgrpd *rgd; @@ -406,11 +362,55 @@ return ERR_PTR(error); } -struct export_operations gfs_export_ops = { - .decode_fh = gfs_decode_fh, +static struct dentry *gfs_fh_to_dentry(struct super_block *sb, struct fid *fid, + int fh_len, int fh_type) +{ + struct inode_cookie this; + __u32 *fh = fid-raw; + + atomic_inc(get_v2sdp(sb)-sd_ops_export); + + switch (fh_type) { + case 6: + case 5: + case 3: + this.gen_valid = TRUE; + this.gen = gfs32_to_cpu(fh[2]); + this.formal_ino = ((uint64_t)gfs32_to_cpu(fh[0])) 32; + this.formal_ino |= (uint64_t)gfs32_to_cpu(fh[1]); + return gfs_get_dentry(sb, this); + default: + return NULL; + } +} + +static struct dentry *gfs_fh_to_parent(struct super_block *sb, struct fid *fid, + int fh_len, int fh_type) +{ + struct inode_cookie parent; + __u32 *fh = fid-raw; + + atomic_inc(get_v2sdp(sb)-sd_ops_export); + + switch (fh_type) { + case 6: + parent.gen_valid = TRUE; + parent.gen = gfs32_to_cpu(fh[5]); + case 5: + parent.formal_ino = ((uint64_t)gfs32_to_cpu(fh[3])) 32; + parent.formal_ino |= (uint64_t)gfs32_to_cpu(fh[4]); + default: + return NULL; + } + + return gfs_get_dentry(sb, parent); +} + +const struct export_operations gfs_export_ops = { .encode_fh = gfs_encode_fh, + .fh_to_dentry = gfs_fh_to_dentry, + .fh_to_parent = gfs_fh_to_parent, .get_name = gfs_get_name, .get_parent = gfs_get_parent, - .get_dentry = gfs_get_dentry, }; --- cluster/gfs-kernel/src/gfs/ops_export.h 2006/07/10 23:22:34 1.2 +++ cluster/gfs-kernel/src/gfs/ops_export.h 2008/01/28 06:29:25 1.3 @@ -14,6 +14,6 @@ #ifndef __OPS_EXPORT_DOT_H__ #define __OPS_EXPORT_DOT_H__ -extern struct export_operations gfs_export_ops; +extern const struct export_operations gfs_export_ops; #endif /* __OPS_EXPORT_DOT_H__ */ --- cluster/gfs-kernel/src/gfs/ops_vm.c 2007/07/23 16:41:46 1.8 +++ cluster/gfs-kernel/src/gfs/ops_vm.c 2008/01/28 06:29:25 1.9 @@ -94,9 +94,10 @@ */ static int -alloc_page_backing(struct gfs_inode *ip, unsigned long index)
[Cluster-devel] cluster/csnap/src Makefile.bak
CVSROOT:/cvs/cluster Module name:cluster Changes by: [EMAIL PROTECTED] 2008-01-28 07:43:42 Removed files: csnap/src : Makefile.bak Log message: Remove obsolete file Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/csnap/src/Makefile.bak.diff?cvsroot=clusterr1=1.1r2=NONE