Re: [PATCH] fix problems related to journaling in Reiserfs

2005-09-01 Thread Chris Mason
On Wed, 31 Aug 2005 20:35:52 -0700
Hans Reiser <[EMAIL PROTECTED]> wrote:

> Thanks much Hifumi!
> 
> Chris, please comment on the patch.

The problem is that I'm not always making the inode dirty during the
reiserfs_file_write.  The get_block based write function does an
explicit commit during O_SYNC mode.  I've got a cleanup related to this
for quotas and other things, but I didn't realize it would help O_SYNC
as well.

I'll diff/test against mainline in the morning and send out.

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][RFC] Ext3 online resizing locking issue (Again)

2005-09-01 Thread Glauber de Oliveira Costa
Hi.

Here is my new trial for the resize lock issue. 
Basically, it goes as follows: 

To ensure that only one resizer is running at a time, I added a global
lock that is acquired in the very beginning of ext3_group_add and
ext3_group_extend. 

lock_super is now only used in ext3_group_add in the moment we alter
s_groups_count, and released after the super block is marked dirty.

In ext3_group_extend, this is done outside the main function, so we can
do it trusting the lock to be already held while in remount, or
acquiring it explicitly while in ioctl. 

The lock in ext3_setup_new_group_blocks was simply wiped out, since this
is always called from one of the functions that already holds the lock
(and thus, in a safe environment)

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>


-- 
=
Glauber de Oliveira Costa
IBM Linux Technology Center - Brazil
[EMAIL PROTECTED]
=
Only in linux/fs/ext3/: .tmp_versions
Only in linux/fs/ext3/: fileidx
diff -bup linux-2.6.13-orig/fs/ext3/ioctl.c linux/fs/ext3/ioctl.c
--- linux-2.6.13-orig/fs/ext3/ioctl.c   2005-09-01 12:44:31.0 -0300
+++ linux/fs/ext3/ioctl.c   2005-09-01 11:42:22.0 -0300
@@ -207,6 +207,12 @@ flags_err:
return -EFAULT;
 
err = ext3_group_extend(sb, EXT3_SB(sb)->s_es, n_blocks_count);
+   if (!err){
+   lock_super(sb);
+   sb->s_dirt = 1;
+   unlock_super(sb);
+   }
+   
journal_lock_updates(EXT3_SB(sb)->s_journal);
journal_flush(EXT3_SB(sb)->s_journal);
journal_unlock_updates(EXT3_SB(sb)->s_journal);
Only in linux/fs/ext3/: patch
diff -bup linux-2.6.13-orig/fs/ext3/resize.c linux/fs/ext3/resize.c
--- linux-2.6.13-orig/fs/ext3/resize.c  2005-09-01 12:44:31.0 -0300
+++ linux/fs/ext3/resize.c  2005-09-01 19:30:40.0 -0300
@@ -23,6 +23,8 @@
 #define outside(b, first, last)((b) < (first) || (b) >= (last))
 #define inside(b, first, last) ((b) >= (first) && (b) < (last))
 
+DECLARE_MUTEX(resize_lock);
+
 static int verify_group_input(struct super_block *sb,
  struct ext3_new_group_data *input)
 {
@@ -178,7 +180,6 @@ static int setup_new_group_blocks(struct
if (IS_ERR(handle))
return PTR_ERR(handle);
 
-   lock_super(sb);
if (input->group != sbi->s_groups_count) {
err = -EBUSY;
goto exit_journal;
@@ -194,6 +195,7 @@ static int setup_new_group_blocks(struct
ext3_set_bit(0, bh->b_data);
}
 
+   smp_rmb();
/* Copy all of the GDT blocks into the backup in this group */
for (i = 0, bit = 1, block = start + 1;
 i < gdblocks; i++, block++, bit++) {
@@ -271,7 +273,6 @@ exit_bh:
brelse(bh);
 
 exit_journal:
-   unlock_super(sb);
if ((err2 = ext3_journal_stop(handle)) && !err)
err = err2;
 
@@ -706,6 +707,11 @@ int ext3_group_add(struct super_block *s
int gdb_off, gdb_num;
int err, err2;
 
+   if (unlikely(down_trylock(&resize_lock))){
+   ext3_warning(sb,__FUNCTION__,"multiple resizers run on 
filesystem. Aborting\n");
+   return -EBUSY;
+   }
+
gdb_num = input->group / EXT3_DESC_PER_BLOCK(sb);
gdb_off = input->group % EXT3_DESC_PER_BLOCK(sb);
 
@@ -753,12 +759,6 @@ int ext3_group_add(struct super_block *s
goto exit_put;
}
 
-   lock_super(sb);
-   if (input->group != EXT3_SB(sb)->s_groups_count) {
-   ext3_warning(sb, __FUNCTION__,
-"multiple resizers run on filesystem!\n");
-   goto exit_journal;
-   }
 
if ((err = ext3_journal_get_write_access(handle, sbi->s_sbh)))
goto exit_journal;
@@ -847,6 +847,7 @@ int ext3_group_add(struct super_block *s
 */
smp_wmb();
 
+   lock_super(sb);
/* Update the global fs size fields */
EXT3_SB(sb)->s_groups_count++;
 
@@ -865,9 +866,9 @@ int ext3_group_add(struct super_block *s
 
ext3_journal_dirty_metadata(handle, EXT3_SB(sb)->s_sbh);
sb->s_dirt = 1;
+   unlock_super(sb);
 
 exit_journal:
-   unlock_super(sb);
if ((err2 = ext3_journal_stop(handle)) && !err)
err = err2;
if (!err) {
@@ -877,6 +878,7 @@ exit_journal:
   primary->b_size);
}
 exit_put:
+   up(&resize_lock);
iput(inode);
return err;
 } /* ext3_group_add */
@@ -901,6 +903,12 @@ int ext3_group_extend(struct super_block
handle_t *handle;
int err, freed_blocks;
 
+
+   if (unlikely(down_trylock(&resize_lock))){ 
+   ext3_warning(sb,__FUNCTION__,"multiple resizers run on 
filesystem. Aborting\n");
+   return -EBUSY;
+   }

Re: GFS, what's remaining

2005-09-01 Thread Andrew Morton
Alan Cox <[EMAIL PROTECTED]> wrote:
>
> On Iau, 2005-09-01 at 03:59 -0700, Andrew Morton wrote:
> > - Why the kernel needs two clustered fileystems
> 
> So delete reiserfs4, FAT, VFAT, ext2, and all the other "junk". 

Well, we did delete intermezzo.

I was looking for technical reasons, please.

> > - Why GFS is better than OCFS2, or has functionality which OCFS2 cannot
> >   possibly gain (or vice versa)
> > 
> > - Relative merits of the two offerings
> 
> You missed the important one - people actively use it and have been for
> some years. Same reason with have NTFS, HPFS, and all the others. On
> that alone it makes sense to include.

Again, that's not a technical reason.  It's _a_ reason, sure.  But what are
the technical reasons for merging gfs[2], ocfs2, both or neither?

If one can be grown to encompass the capabilities of the other then we're
left with a bunch of legacy code and wasted effort.

I'm not saying it's wrong.  But I'd like to hear the proponents explain why
it's right, please.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Linux-cluster] Re: GFS, what's remaining

2005-09-01 Thread Hua Zhong \(hzhong\)
I just started looking at gfs. To understand it you'd need to look at it
from the entire cluster solution point of view.

This is a good document from David. It's not about GFS in particular but
about the architecture of the cluster.

http://people.redhat.com/~teigland/sca.pdf 

Hua

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Christoph Hellwig
> Sent: Thursday, September 01, 2005 10:56 AM
> To: Alan Cox
> Cc: Christoph Hellwig; Andrew Morton; 
> linux-fsdevel@vger.kernel.org; [EMAIL PROTECTED]; 
> linux-kernel@vger.kernel.org
> Subject: [Linux-cluster] Re: GFS, what's remaining
> 
> On Thu, Sep 01, 2005 at 04:28:30PM +0100, Alan Cox wrote:
> > > That's GFS.  The submission is about a GFS2 that's 
> on-disk incompatible
> > > to GFS.
> > 
> > Just like say reiserfs3 and reiserfs4 or ext and ext2 or 
> ext2 and ext3
> > then. I think the main point still stands - we have always taken
> > multiple file systems on board and we have benefitted 
> enormously from
> > having the competition between them instead of a dictat 
> from the kernel
> > kremlin that 'foofs is the one true way'
> 
> I didn't say anything agains a particular fs, just that your previous
> arguments where utter nonsense.  In fact I think having two 
> or more cluster
> filesystems in the tree is a good thing.  Whether the gfs2 
> code is mergeable
> is a completely different question, and it seems at least debatable to
> submit a filesystem for inclusion that's still pretty new.
> 
> While we're at it I can't find anything describing what gfs2 is about,
> what is lacking in gfs, what structual changes did you make, etc..
> 
> p.s. why is gfs2 in fs/gfs in the kernel tree?
> 
> --
> Linux-cluster mailing list
> [EMAIL PROTECTED]
> http://www.redhat.com/mailman/listinfo/linux-cluster
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GFS, what's remaining

2005-09-01 Thread Christoph Hellwig
On Thu, Sep 01, 2005 at 04:28:30PM +0100, Alan Cox wrote:
> > That's GFS.  The submission is about a GFS2 that's on-disk incompatible
> > to GFS.
> 
> Just like say reiserfs3 and reiserfs4 or ext and ext2 or ext2 and ext3
> then. I think the main point still stands - we have always taken
> multiple file systems on board and we have benefitted enormously from
> having the competition between them instead of a dictat from the kernel
> kremlin that 'foofs is the one true way'

I didn't say anything agains a particular fs, just that your previous
arguments where utter nonsense.  In fact I think having two or more cluster
filesystems in the tree is a good thing.  Whether the gfs2 code is mergeable
is a completely different question, and it seems at least debatable to
submit a filesystem for inclusion that's still pretty new.

While we're at it I can't find anything describing what gfs2 is about,
what is lacking in gfs, what structual changes did you make, etc..

p.s. why is gfs2 in fs/gfs in the kernel tree?
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GFS, what's remaining

2005-09-01 Thread Daniel Phillips
On Thursday 01 September 2005 06:46, David Teigland wrote:
> I'd like to get a list of specific things remaining for merging.

Where are the benchmarks and stability analysis?  How many hours does it 
survive cerberos running on all nodes simultaneously?  Where are the 
testimonials from users?  How long has there been a gfs2 filesystem?  Note 
that Reiser4 is still not in mainline a year after it was first offered, why 
do you think gfs2 should be in mainline after one month?

So far, all catches are surface things like bogus spinlocks.  Substantive 
issues have not even begun to be addressed.  Patience please, this is going 
to take a while.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GFS, what's remaining

2005-09-01 Thread Daniel Phillips
On Thursday 01 September 2005 10:49, Alan Cox wrote:
> On Iau, 2005-09-01 at 03:59 -0700, Andrew Morton wrote:
> > - Why GFS is better than OCFS2, or has functionality which OCFS2 cannot
> >   possibly gain (or vice versa)
> >
> > - Relative merits of the two offerings
>
> You missed the important one - people actively use it and have been for
> some years. Same reason with have NTFS, HPFS, and all the others. On
> that alone it makes sense to include.

I thought that gfs2 just appeared last month.  Or is it really still just gfs?  
If there are substantive changes from gfs to gfs2 then obviously they have 
had practically zero testing, let alone posted benchmarks, testimonials, etc.  
If it is really still just gfs then the silly-rename should be undone.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/14] GFS: headers

2005-09-01 Thread Jörn Engel
On Thu, 1 September 2005 22:59:48 +0800, David Teigland wrote:
> 
> We offered to removed this when I explained it before.  It sounds like it
> would give you some comfort so I'll just go ahead and do it barring any
> pleas otherwise.

Please do.  Just have one test machine with an endianness different
from the on-disk format.

Having the on-disk format always be big-endian would serve this
purpose quite well, btw.  But buying an (before-intel) apple machine
also would.

Jörn

-- 
More computing sins are committed in the name of efficiency (without
necessarily achieving it) than for any other single reason - including
blind stupidity.
-- W. A. Wulf 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GFS, what's remaining

2005-09-01 Thread Lars Marowsky-Bree
On 2005-09-01T16:28:30, Alan Cox <[EMAIL PROTECTED]> wrote:

> Competition will decide if OCFS or GFS is better, or indeed if someone
> comes along with another contender that is better still. And competition
> will probably get the answer right.

Competition will come up with the same situation like reiserfs and ext3
and XFS, namely that they'll all be maintained going forward because of,
uhm, political constraints ;-)

But then, as long as they _are_ maintained and play along nicely with
eachother (which, btw, is needed already so that at least data can be
migrated...), I don't really see a problem of having two or three.

> The only thing that is important is we don't end up with each cluster fs
> wanting different core VFS interfaces added.

Indeed.


Sincerely,
Lars Marowsky-Brée <[EMAIL PROTECTED]>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GFS, what's remaining

2005-09-01 Thread Alan Cox
> That's GFS.  The submission is about a GFS2 that's on-disk incompatible
> to GFS.

Just like say reiserfs3 and reiserfs4 or ext and ext2 or ext2 and ext3
then. I think the main point still stands - we have always taken
multiple file systems on board and we have benefitted enormously from
having the competition between them instead of a dictat from the kernel
kremlin that 'foofs is the one true way'

Competition will decide if OCFS or GFS is better, or indeed if someone
comes along with another contender that is better still. And competition
will probably get the answer right.

The only thing that is important is we don't end up with each cluster fs
wanting different core VFS interfaces added.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/14] GFS: headers

2005-09-01 Thread David Teigland
On Thu, Sep 01, 2005 at 04:19:34PM +0200, Arjan van de Ven wrote:

> > +/* Endian functions */
> 
> e again why?? 
> Why is this a compiletime hack?
> Either you care about either-endian on disk, at which point it has to be
> a runtime thing, or you make the on disk layout fixed endian, at which
> point you really shouldn't abstract be16_to_cpu etc any further!

Again, on-disk is fixed little endian, so we have for example:

#define gfs2_32_to_cpu le32_to_cpu
#define cpu_to_gfs2_32 cpu_to_le32

To _test_ and _verify_ the endian-handling of the code we can
#define GFS2_ENDIAN_BIG which switches the above to:

#define gfs2_32_to_cpu to be32_to_cpu
#define cpu_to_gfs2_32 to cpu_to_be32

We offered to removed this when I explained it before.  It sounds like it
would give you some comfort so I'll just go ahead and do it barring any
pleas otherwise.

Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GFS, what's remaining

2005-09-01 Thread Christoph Hellwig
On Thu, Sep 01, 2005 at 03:49:18PM +0100, Alan Cox wrote:
> > - Why GFS is better than OCFS2, or has functionality which OCFS2 cannot
> >   possibly gain (or vice versa)
> > 
> > - Relative merits of the two offerings
> 
> You missed the important one - people actively use it and have been for
> some years. Same reason with have NTFS, HPFS, and all the others. On
> that alone it makes sense to include.

That's GFS.  The submission is about a GFS2 that's on-disk incompatible
to GFS.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/14] GFS: headers

2005-09-01 Thread viro
On Thu, Sep 01, 2005 at 04:19:34PM +0200, Arjan van de Ven wrote:
> > +/* Endian functions */
> 
> e again why?? 
> Why is this a compiletime hack?
> Either you care about either-endian on disk, at which point it has to be
> a runtime thing, or you make the on disk layout fixed endian, at which
> point you really shouldn't abstract be16_to_cpu etc any further!

Well...  I would disagree with the very end of it (e.g. having on-disk
block number representation declared as __bitwise, so that it wouldn't be
mixed with __be + having coversion helpers consisting of
static inline u32 foo_to_cpu(foo n)
{
return be32_to_cpu((__force __be32)n);
}
etc. may be valid technics, assuming that these objects were passed around
enough to deserve it.

Blanket "let's rename for the sake of renaming" is a BS, of course...
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/13] GFS: directories

2005-09-01 Thread David Teigland
Code that handles directory operations.

Signed-off-by: Ken Preslan <[EMAIL PROTECTED]>
Signed-off-by: David Teigland <[EMAIL PROTECTED]>

---

 fs/gfs2/dir.c | 2158 ++
 fs/gfs2/dir.h |   51 +
 2 files changed, 2209 insertions(+)

--- a/fs/gfs2/dir.c 1970-01-01 07:30:00.0 +0730
+++ b/fs/gfs2/dir.c 2005-09-01 17:36:55.180135992 +0800
@@ -0,0 +1,2158 @@
+/*
+ * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
+ * Copyright (C) 2004-2005 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ */
+
+/*
+* Implements Extendible Hashing as described in:
+*   "Extendible Hashing" by Fagin, et al in
+* __ACM Trans. on Database Systems__, Sept 1979.
+*
+*
+* Here's the layout of dirents which is essentially the same as that of ext2
+* within a single block. The field de_name_len is the number of bytes
+* actually required for the name (no null terminator). The field de_rec_len
+* is the number of bytes allocated to the dirent. The offset of the next
+* dirent in the block is (dirent + dirent->de_rec_len). When a dirent is
+* deleted, the preceding dirent inherits its allocated space, ie
+* prev->de_rec_len += deleted->de_rec_len. Since the next dirent is obtained
+* by adding de_rec_len to the current dirent, this essentially causes the
+* deleted dirent to get jumped over when iterating through all the dirents.
+*
+* When deleting the first dirent in a block, there is no previous dirent so
+* the field de_ino is set to zero to designate it as deleted. When allocating
+* a dirent, gfs2_dirent_alloc iterates through the dirents in a block. If the
+* first dirent has (de_ino == 0) and de_rec_len is large enough, this first
+* dirent is allocated. Otherwise it must go through all the 'used' dirents
+* searching for one in which the amount of total space minus the amount of
+* used space will provide enough space for the new dirent.
+*
+* There are two types of blocks in which dirents reside. In a stuffed dinode,
+* the dirents begin at offset sizeof(struct gfs2_dinode) from the beginning of
+* the block.  In leaves, they begin at offset sizeof(struct gfs2_leaf) from the
+* beginning of the leaf block. The dirents reside in leaves when
+*
+* dip->i_di.di_flags & GFS2_DIF_EXHASH is true
+*
+* Otherwise, the dirents are "linear", within a single stuffed dinode block.
+*
+* When the dirents are in leaves, the actual contents of the directory file are
+* used as an array of 64-bit block pointers pointing to the leaf blocks. The
+* dirents are NOT in the directory file itself. There can be more than one 
block
+* pointer in the array that points to the same leaf. In fact, when a directory
+* is first converted from linear to exhash, all of the pointers point to the
+* same leaf.
+*
+* When a leaf is completely full, the size of the hash table can be
+* doubled unless it is already at the maximum size which is hard coded into
+* GFS2_DIR_MAX_DEPTH. After that, leaves are chained together in a linked list,
+* but never before the maximum hash table size has been reached.
+*/
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "gfs2.h"
+#include "dir.h"
+#include "glock.h"
+#include "inode.h"
+#include "jdata.h"
+#include "meta_io.h"
+#include "quota.h"
+#include "rgrp.h"
+#include "trans.h"
+
+#define IS_LEAF 1 /* Hashed (leaf) directory */
+#define IS_DINODE   2 /* Linear (stuffed dinode block) directory */
+
+#if 1
+#define gfs2_disk_hash2offset(h) (((uint64_t)(h)) >> 1)
+#define gfs2_dir_offset2hash(p) ((uint32_t)(((uint64_t)(p)) << 1))
+#else
+#define gfs2_disk_hash2offset(h) (((uint64_t)(h)))
+#define gfs2_dir_offset2hash(p) ((uint32_t)(((uint64_t)(p
+#endif
+
+typedef int (*leaf_call_t) (struct gfs2_inode *dip,
+   uint32_t index, uint32_t len, uint64_t leaf_no,
+   void *data);
+
+/**
+ * int gfs2_filecmp - Compare two filenames
+ * @file1: The first filename
+ * @file2: The second filename
+ * @len_of_file2: The length of the second file
+ *
+ * This routine compares two filenames and returns TRUE if they are equal.
+ *
+ * Returns: TRUE (!=0) if the files are the same, otherwise FALSE (0).
+ */
+
+int gfs2_filecmp(struct qstr *file1, char *file2, int len_of_file2)
+{
+   if (file1->len != len_of_file2)
+   return FALSE;
+   if (memcmp(file1->name, file2, file1->len))
+   return FALSE;
+   return TRUE;
+}
+
+/**
+ * dirent_first - Return the first dirent
+ * @dip: the directory
+ * @bh: The buffer
+ * @dent: Pointer to list of dirents
+ *
+ * return first dirent whether bh points to leaf or stuffed dinode
+ *
+ * Returns: IS_LEAF, IS_DINODE, or -errno
+ */
+
+static int dirent_first(struct gfs2_inode *dip, struc

[PATCH 04/13] GFS: allocation

2005-09-01 Thread David Teigland
Code that manages block allocation.

Signed-off-by: Ken Preslan <[EMAIL PROTECTED]>
Signed-off-by: David Teigland <[EMAIL PROTECTED]>

---

 fs/gfs2/bits.c |  179 +++
 fs/gfs2/bits.h |   28 +
 fs/gfs2/rgrp.c | 1374 +
 fs/gfs2/rgrp.h |   62 ++
 4 files changed, 1643 insertions(+)

--- a/fs/gfs2/rgrp.c1970-01-01 07:30:00.0 +0730
+++ b/fs/gfs2/rgrp.c2005-09-01 17:36:55.478090696 +0800
@@ -0,0 +1,1374 @@
+/*
+ * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
+ * Copyright (C) 2004-2005 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "gfs2.h"
+#include "bits.h"
+#include "glock.h"
+#include "glops.h"
+#include "jdata.h"
+#include "lops.h"
+#include "meta_io.h"
+#include "quota.h"
+#include "rgrp.h"
+#include "super.h"
+#include "trans.h"
+
+/**
+ * gfs2_rgrp_verify - Verify that a resource group is consistent
+ * @sdp: the filesystem
+ * @rgd: the rgrp
+ *
+ */
+
+void gfs2_rgrp_verify(struct gfs2_rgrpd *rgd)
+{
+   struct gfs2_sbd *sdp = rgd->rd_sbd;
+   struct gfs2_bitmap *bi = NULL;
+   uint32_t length = rgd->rd_ri.ri_length;
+   uint32_t count[4], tmp;
+   int buf, x;
+
+   memset(count, 0, 4 * sizeof(uint32_t));
+
+   /* Count # blocks in each of 4 possible allocation states */
+   for (buf = 0; buf < length; buf++) {
+   bi = rgd->rd_bits + buf;
+   for (x = 0; x < 4; x++)
+   count[x] += gfs2_bitcount(rgd,
+ bi->bi_bh->b_data +
+ bi->bi_offset,
+ bi->bi_len, x);
+   }
+
+   if (count[0] != rgd->rd_rg.rg_free) {
+   if (gfs2_consist_rgrpd(rgd))
+   fs_err(sdp, "free data mismatch:  %u != %u\n",
+  count[0], rgd->rd_rg.rg_free);
+   return;
+   }
+
+   tmp = rgd->rd_ri.ri_data -
+   rgd->rd_rg.rg_free -
+   rgd->rd_rg.rg_dinodes;
+   if (count[1] != tmp) {
+   if (gfs2_consist_rgrpd(rgd))
+   fs_err(sdp, "used data mismatch:  %u != %u\n",
+  count[1], tmp);
+   return;
+   }
+
+   if (count[2]) {
+   if (gfs2_consist_rgrpd(rgd))
+   fs_err(sdp, "free metadata mismatch:  %u != 0\n",
+  count[2]);
+   return;
+   }
+
+   if (count[3] != rgd->rd_rg.rg_dinodes) {
+   if (gfs2_consist_rgrpd(rgd))
+   fs_err(sdp, "used metadata mismatch:  %u != %u\n",
+  count[3], rgd->rd_rg.rg_dinodes);
+   return;
+   }
+}
+
+static inline int rgrp_contains_block(struct gfs2_rindex *ri, uint64_t block)
+{
+   uint64_t first = ri->ri_data0;
+   uint64_t last = first + ri->ri_data;
+   return !!(first <= block && block < last);
+}
+
+/**
+ * gfs2_blk2rgrpd - Find resource group for a given data/meta block number
+ * @sdp: The GFS2 superblock
+ * @n: The data block number
+ *
+ * Returns: The resource group, or NULL if not found
+ */
+
+struct gfs2_rgrpd *gfs2_blk2rgrpd(struct gfs2_sbd *sdp, uint64_t blk)
+{
+   struct gfs2_rgrpd *rgd;
+
+   spin_lock(&sdp->sd_rindex_spin);
+
+   list_for_each_entry(rgd, &sdp->sd_rindex_mru_list, rd_list_mru) {
+   if (rgrp_contains_block(&rgd->rd_ri, blk)) {
+   list_move(&rgd->rd_list_mru, &sdp->sd_rindex_mru_list);
+   spin_unlock(&sdp->sd_rindex_spin);
+   return rgd;
+   }
+   }
+
+   spin_unlock(&sdp->sd_rindex_spin);
+
+   return NULL;
+}
+
+/**
+ * gfs2_rgrpd_get_first - get the first Resource Group in the filesystem
+ * @sdp: The GFS2 superblock
+ *
+ * Returns: The first rgrp in the filesystem
+ */
+
+struct gfs2_rgrpd *gfs2_rgrpd_get_first(struct gfs2_sbd *sdp)
+{
+   gfs2_assert(sdp, !list_empty(&sdp->sd_rindex_list),);
+   return list_entry(sdp->sd_rindex_list.next, struct gfs2_rgrpd, rd_list);
+}
+
+/**
+ * gfs2_rgrpd_get_next - get the next RG
+ * @rgd: A RG
+ *
+ * Returns: The next rgrp
+ */
+
+struct gfs2_rgrpd *gfs2_rgrpd_get_next(struct gfs2_rgrpd *rgd)
+{
+   if (rgd->rd_list.next == &rgd->rd_sbd->sd_rindex_list)
+   return NULL;
+   return list_entry(rgd->rd_list.next, struct gfs2_rgrpd, rd_list);
+}
+
+static void clear_rgrpdi(struct gfs2_sbd *sdp)
+{
+   struct list_head *head;
+   struct gfs2_rgrpd *rgd;
+   struct gfs2_glock *gl;
+
+   spin_lock(&sdp->sd_rindex_spin);
+   sdp-

Re: GFS, what's remaining

2005-09-01 Thread Alan Cox
On Iau, 2005-09-01 at 03:59 -0700, Andrew Morton wrote:
> - Why the kernel needs two clustered fileystems

So delete reiserfs4, FAT, VFAT, ext2, and all the other "junk". 

> - Why GFS is better than OCFS2, or has functionality which OCFS2 cannot
>   possibly gain (or vice versa)
> 
> - Relative merits of the two offerings

You missed the important one - people actively use it and have been for
some years. Same reason with have NTFS, HPFS, and all the others. On
that alone it makes sense to include.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/14] GFS: headers

2005-09-01 Thread Arjan van de Ven

> +#ifndef TRUE
> +#define TRUE 1
> +#endif
> +
> +#ifndef FALSE
> +#define FALSE 0
> +#endif

eh why can't you just use the regular kernel conventions


> +
> +#define NO_CREATE 0
> +#define CREATE 1
> +
> +#define NO_WAIT 0
> +#define WAIT 1
> +
> +#define NO_FORCE 0
> +#define FORCE 1

these deserve enums


> +
> +/* Actions */
> +#define HIF_MUTEX0
> +#define HIF_PROMOTE  1
> +#define HIF_DEMOTE   2
> +#define HIF_GREEDY   3
> +
> +/* States */
> +#define HIF_ALLOCED  4
> +#define HIF_DEALLOC  5
> +#define HIF_HOLDER   6
> +#define HIF_FIRST7
> +#define HIF_RECURSE  8
> +#define HIF_ABORTED  9

enum?


> +#define _GFS2C_(x)   (('G' << 16) | ('2' << 8) | (x))
> +
> +/* Ioctls implemented */
> +
> +#define GFS2_IOCTL_IDENTIFY  _GFS2C_(1)
> +#define GFS2_IOCTL_SUPER _GFS2C_(2)

have you registered these in ioctl.txt?


> +
> +struct gfs2_ioctl {
> + unsigned int gi_argc;
> + char **gi_argv;
> +
> +char __user *gi_data;
> + unsigned int gi_size;
> + uint64_t gi_offset;
> +};

what is this for??

> +/* Endian functions */

e again why?? 
Why is this a compiletime hack?
Either you care about either-endian on disk, at which point it has to be
a runtime thing, or you make the on disk layout fixed endian, at which
point you really shouldn't abstract be16_to_cpu etc any further!




-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/14] GFS: headers

2005-09-01 Thread David Teigland
Central header files that are widely used.

Signed-off-by: Ken Preslan <[EMAIL PROTECTED]>
Signed-off-by: David Teigland <[EMAIL PROTECTED]>

---

 fs/gfs2/gfs2.h  |   77 +++
 fs/gfs2/incore.h|  691 +++
 include/linux/gfs2_ioctl.h  |   30 +
 include/linux/gfs2_ondisk.h | 1119 
 4 files changed, 1917 insertions(+)

--- a/fs/gfs2/gfs2.h1970-01-01 07:30:00.0 +0730
+++ b/fs/gfs2/gfs2.h2005-09-01 17:36:55.202132648 +0800
@@ -0,0 +1,77 @@
+/*
+ * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
+ * Copyright (C) 2004-2005 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ */
+
+#ifndef __GFS2_DOT_H__
+#define __GFS2_DOT_H__
+
+#include 
+
+#include "locking/harness/lm_interface.h"
+#include "lvb.h"
+#include "incore.h"
+#include "util.h"
+
+#ifndef TRUE
+#define TRUE 1
+#endif
+
+#ifndef FALSE
+#define FALSE 0
+#endif
+
+#define NO_CREATE 0
+#define CREATE 1
+
+#define NO_WAIT 0
+#define WAIT 1
+
+#define NO_FORCE 0
+#define FORCE 1
+
+#if (BITS_PER_LONG == 64)
+#define PRIu64 "lu"
+#define PRId64 "ld"
+#define PRIx64 "lx"
+#define PRIX64 "lX"
+#else
+#define PRIu64 "Lu"
+#define PRId64 "Ld"
+#define PRIx64 "Lx"
+#define PRIX64 "LX"
+#endif
+
+/*  Divide num by den.  Round up if there is a remainder.  */
+#define DIV_RU(num, den) (((num) + (den) - 1) / (den))
+#define MAKE_MULT8(x) (((x) + 7) & ~7)
+
+#define GFS2_FAST_NAME_SIZE 8
+
+#define get_v2sdp(sb) ((struct gfs2_sbd *)(sb)->s_fs_info)
+#define set_v2sdp(sb, sdp) (sb)->s_fs_info = (sdp)
+#define get_v2ip(inode) ((struct gfs2_inode *)(inode)->u.generic_ip)
+#define set_v2ip(inode, ip) (inode)->u.generic_ip = (ip)
+#define get_v2fp(file) ((struct gfs2_file *)(file)->private_data)
+#define set_v2fp(file, fp) (file)->private_data = (fp)
+#define get_v2bd(bh) ((struct gfs2_bufdata *)(bh)->b_private)
+#define set_v2bd(bh, bd) (bh)->b_private = (bd)
+#define get_v2db(bh) ((struct gfs2_databuf *)(bh)->b_private)
+#define set_v2db(bh, db) (bh)->b_private = (db)
+
+#define get_transaction ((struct gfs2_trans *)(current->journal_info))
+#define set_transaction(tr) (current->journal_info) = (tr)
+
+#define get_gl2ip(gl) ((struct gfs2_inode *)(gl)->gl_object)
+#define set_gl2ip(gl, ip) (gl)->gl_object = (ip)
+#define get_gl2rgd(gl) ((struct gfs2_rgrpd *)(gl)->gl_object)
+#define set_gl2rgd(gl, rgd) (gl)->gl_object = (rgd)
+#define get_gl2gl(gl) ((struct gfs2_glock *)(gl)->gl_object)
+#define set_gl2gl(gl, gl2) (gl)->gl_object = (gl2)
+
+#endif /* __GFS2_DOT_H__ */
+
--- a/fs/gfs2/incore.h  1970-01-01 07:30:00.0 +0730
+++ b/fs/gfs2/incore.h  2005-09-01 17:36:55.283120336 +0800
@@ -0,0 +1,691 @@
+/*
+ * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
+ * Copyright (C) 2004-2005 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ */
+
+#ifndef __INCORE_DOT_H__
+#define __INCORE_DOT_H__
+
+#define DIO_FORCE  0x0001
+#define DIO_CLEAN  0x0002
+#define DIO_DIRTY  0x0004
+#define DIO_START  0x0008
+#define DIO_WAIT   0x0010
+#define DIO_METADATA   0x0020
+#define DIO_DATA   0x0040
+#define DIO_RELEASE0x0080
+#define DIO_ALL0x0100
+
+struct gfs2_log_operations;
+struct gfs2_log_element;
+struct gfs2_bitmap;
+struct gfs2_rgrpd;
+struct gfs2_bufdata;
+struct gfs2_databuf;
+struct gfs2_glock_operations;
+struct gfs2_holder;
+struct gfs2_glock;
+struct gfs2_alloc;
+struct gfs2_inode;
+struct gfs2_file;
+struct gfs2_revoke;
+struct gfs2_revoke_replay;
+struct gfs2_unlinked;
+struct gfs2_quota_data;
+struct gfs2_log_buf;
+struct gfs2_trans;
+struct gfs2_ail;
+struct gfs2_jdesc;
+struct gfs2_args;
+struct gfs2_tune;
+struct gfs2_gl_hash_bucket;
+struct gfs2_sbd;
+
+typedef void (*gfs2_glop_bh_t) (struct gfs2_glock *gl, unsigned int ret);
+
+/*
+ * Structure of operations that are associated with each
+ * type of element in the log.
+ */
+
+struct gfs2_log_operations {
+   void (*lo_add) (struct gfs2_sbd *sdp, struct gfs2_log_element *le);
+   void (*lo_incore_commit) (struct gfs2_sbd *sdp, struct gfs2_trans *tr);
+   void (*lo_before_commit) (struct gfs2_sbd *sdp);
+   void (*lo_after_commit) (struct gfs2_sbd *sdp, struct gfs2_ail *ai);
+   void (*lo_before_scan) (struct gfs2_jdesc *jd,
+   struct gfs2_log_header *head, int pass);
+   int (*lo_scan_elements) (struct gfs2_jdesc *jd, unsigned int start,
+struct gfs2_log_descriptor *ld, int pass);
+   void (*lo_after_scan) (struct gfs2_jdesc *jd, int error, int pa

[PATCH 11/13] GFS: lock_harness module

2005-09-01 Thread David Teigland
The lock_harness module allows a gfs file system to connect to a given
lock module.

Signed-off-by: Ken Preslan <[EMAIL PROTECTED]>
Signed-off-by: David Teigland <[EMAIL PROTECTED]>

---

 fs/gfs2/locking/harness/Makefile   |3 
 fs/gfs2/locking/harness/lm_interface.h |  286 +
 fs/gfs2/locking/harness/main.c |  206 +++
 3 files changed, 495 insertions(+)

diff -urpN a/fs/gfs2/locking/harness/Makefile b/fs/gfs2/locking/harness/Makefile
--- a/fs/gfs2/locking/harness/Makefile  1970-01-01 07:30:00.0 +0730
+++ b/fs/gfs2/locking/harness/Makefile  2005-09-01 17:23:36.150606944 +0800
@@ -0,0 +1,3 @@
+obj-$(CONFIG_GFS2_FS) += lock_harness.o
+lock_harness-y := main.o
+
diff -urpN a/fs/gfs2/locking/harness/lm_interface.h 
b/fs/gfs2/locking/harness/lm_interface.h
--- a/fs/gfs2/locking/harness/lm_interface.h1970-01-01 07:30:00.0 
+0730
+++ b/fs/gfs2/locking/harness/lm_interface.h2005-09-01 17:23:36.119611656 
+0800
@@ -0,0 +1,286 @@
+/*
+ * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
+ * Copyright (C) 2004-2005 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ */
+
+#ifndef __LM_INTERFACE_DOT_H__
+#define __LM_INTERFACE_DOT_H__
+
+/*
+ * Opaque handles represent the lock module's lockspace structure, the lock
+ * module's lock structures, and GFS's file system (superblock) structure.
+ */
+
+typedef void lm_lockspace_t;
+typedef void lm_lock_t;
+typedef void lm_fsdata_t;
+
+typedef void (*lm_callback_t) (lm_fsdata_t *fsdata, unsigned int type,
+  void *data);
+
+/*
+ * lm_mount() flags
+ *
+ * LM_MFLAG_SPECTATOR
+ * GFS is asking to join the filesystem's lockspace, but it doesn't want to
+ * modify the filesystem.  The lock module shouldn't assign a journal to the FS
+ * mount.  It shouldn't send recovery callbacks to the FS mount.  If the node
+ * dies or withdraws, all locks can be wiped immediately.
+ */
+
+#define LM_MFLAG_SPECTATOR 0x0001
+
+/*
+ * lm_lockstruct flags
+ *
+ * LM_LSFLAG_LOCAL
+ * The lock_nolock module returns LM_LSFLAG_LOCAL to GFS, indicating that GFS
+ * can make single-node optimizations.
+ */
+
+#define LM_LSFLAG_LOCAL0x0001
+
+/*
+ * lm_lockname types
+ */
+
+#define LM_TYPE_RESERVED   0x00
+#define LM_TYPE_NONDISK0x01
+#define LM_TYPE_INODE  0x02
+#define LM_TYPE_RGRP   0x03
+#define LM_TYPE_META   0x04
+#define LM_TYPE_IOPEN  0x05
+#define LM_TYPE_FLOCK  0x06
+#define LM_TYPE_PLOCK  0x07
+#define LM_TYPE_QUOTA  0x08
+#define LM_TYPE_JOURNAL0x09
+
+/*
+ * lm_lock() states
+ *
+ * SHARED is compatible with SHARED, not with DEFERRED or EX.
+ * DEFERRED is compatible with DEFERRED, not with SHARED or EX.
+ */
+
+#define LM_ST_UNLOCKED 0
+#define LM_ST_EXCLUSIVE1
+#define LM_ST_DEFERRED 2
+#define LM_ST_SHARED   3
+
+/*
+ * lm_lock() flags
+ *
+ * LM_FLAG_TRY
+ * Don't wait to acquire the lock if it can't be granted immediately.
+ *
+ * LM_FLAG_TRY_1CB
+ * Send one blocking callback if TRY is set and the lock is not granted.
+ *
+ * LM_FLAG_NOEXP
+ * GFS sets this flag on lock requests it makes while doing journal recovery.
+ * These special requests should not be blocked due to the recovery like
+ * ordinary locks would be.
+ *
+ * LM_FLAG_ANY
+ * A SHARED request may also be granted in DEFERRED, or a DEFERRED request may
+ * also be granted in SHARED.  The preferred state is whichever is compatible
+ * with other granted locks, or the specified state if no other locks exist.
+ *
+ * LM_FLAG_PRIORITY
+ * Override fairness considerations.  Suppose a lock is held in a shared state
+ * and there is a pending request for the deferred state.  A shared lock
+ * request with the priority flag would be allowed to bypass the deferred
+ * request and directly join the other shared lock.  A shared lock request
+ * without the priority flag might be forced to wait until the deferred
+ * requested had acquired and released the lock.
+ */
+
+#define LM_FLAG_TRY0x0001
+#define LM_FLAG_TRY_1CB0x0002
+#define LM_FLAG_NOEXP  0x0004
+#define LM_FLAG_ANY0x0008
+#define LM_FLAG_PRIORITY   0x0010
+
+/*
+ * lm_lock() and lm_async_cb return flags
+ *
+ * LM_OUT_ST_MASK
+ * Masks the lower two bits of lock state in the returned value.
+ *
+ * LM_OUT_CACHEABLE
+ * The lock hasn't been released so GFS can continue to cache data for it.
+ *
+ * LM_OUT_CANCELED
+ * The lock request was canceled.
+ *
+ * LM_OUT_ASYNC
+ * The result of the request will be returned in an LM_CB_ASYNC callback.
+ */
+
+#define LM_OUT_ST_MASK 0x0003
+#define LM_OUT_CACHEABLE   0x000

[PATCH 12/13] GFS: lock_nolock module

2005-09-01 Thread David Teigland
The lock_nolock module does no inter-node locking and allows gfs to be
used as a local file system.

Signed-off-by: Ken Preslan <[EMAIL PROTECTED]>
Signed-off-by: David Teigland <[EMAIL PROTECTED]>

---

 fs/gfs2/locking/nolock/Makefile |3 
 fs/gfs2/locking/nolock/main.c   |  267 
 2 files changed, 270 insertions(+)

diff -urpN a/fs/gfs2/locking/nolock/Makefile b/fs/gfs2/locking/nolock/Makefile
--- a/fs/gfs2/locking/nolock/Makefile   1970-01-01 07:30:00.0 +0730
+++ b/fs/gfs2/locking/nolock/Makefile   2005-09-01 17:23:56.963442912 +0800
@@ -0,0 +1,3 @@
+obj-$(CONFIG_GFS2_FS) += lock_nolock.o
+lock_nolock-y := main.o
+
diff -urpN a/fs/gfs2/locking/nolock/main.c b/fs/gfs2/locking/nolock/main.c
--- a/fs/gfs2/locking/nolock/main.c 1970-01-01 07:30:00.0 +0730
+++ b/fs/gfs2/locking/nolock/main.c 2005-09-01 17:23:56.952444584 +0800
@@ -0,0 +1,267 @@
+/*
+ * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
+ * Copyright (C) 2004-2005 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../harness/lm_interface.h"
+
+struct nolock_lockspace {
+   unsigned int nl_lvb_size;
+};
+
+struct lm_lockops nolock_ops;
+
+static int nolock_mount(char *table_name, char *host_data,
+   lm_callback_t cb, lm_fsdata_t *fsdata,
+   unsigned int min_lvb_size, int flags,
+   struct lm_lockstruct *lockstruct)
+{
+   char *c;
+   unsigned int jid;
+   struct nolock_lockspace *nl;
+
+   /* If there is a "jid=" in the hostdata, return that jid.
+  Otherwise, return zero. */
+
+   c = strstr(host_data, "jid=");
+   if (!c)
+   jid = 0;
+   else {
+   c += 4;
+   sscanf(c, "%u", &jid);
+   }
+
+   nl = kmalloc(sizeof(struct nolock_lockspace), GFP_KERNEL);
+   if (!nl)
+   return -ENOMEM;
+
+   memset(nl, 0, sizeof(struct nolock_lockspace));
+   nl->nl_lvb_size = min_lvb_size;
+
+   lockstruct->ls_jid = jid;
+   lockstruct->ls_first = 1;
+   lockstruct->ls_lvb_size = min_lvb_size;
+   lockstruct->ls_lockspace = (lm_lockspace_t *)nl;
+   lockstruct->ls_ops = &nolock_ops;
+   lockstruct->ls_flags = LM_LSFLAG_LOCAL;
+
+   return 0;
+}
+
+static void nolock_others_may_mount(lm_lockspace_t *lockspace)
+{
+}
+
+static void nolock_unmount(lm_lockspace_t *lockspace)
+{
+   struct nolock_lockspace *nl = (struct nolock_lockspace *)lockspace;
+   kfree(nl);
+}
+
+static void nolock_withdraw(lm_lockspace_t *lockspace)
+{
+}
+
+/**
+ * nolock_get_lock - get a lm_lock_t given a descripton of the lock
+ * @lockspace: the lockspace the lock lives in
+ * @name: the name of the lock
+ * @lockp: return the lm_lock_t here
+ *
+ * Returns: 0 on success, -EXXX on failure
+ */
+
+static int nolock_get_lock(lm_lockspace_t *lockspace, struct lm_lockname *name,
+  lm_lock_t **lockp)
+{
+   *lockp = (lm_lock_t *)lockspace;
+   return 0;
+}
+
+/**
+ * nolock_put_lock - get rid of a lock structure
+ * @lock: the lock to throw away
+ *
+ */
+
+static void nolock_put_lock(lm_lock_t *lock)
+{
+}
+
+/**
+ * nolock_lock - acquire a lock
+ * @lock: the lock to manipulate
+ * @cur_state: the current state
+ * @req_state: the requested state
+ * @flags: modifier flags
+ *
+ * Returns: A bitmap of LM_OUT_*
+ */
+
+static unsigned int nolock_lock(lm_lock_t *lock, unsigned int cur_state,
+   unsigned int req_state, unsigned int flags)
+{
+   return req_state | LM_OUT_CACHEABLE;
+}
+
+/**
+ * nolock_unlock - unlock a lock
+ * @lock: the lock to manipulate
+ * @cur_state: the current state
+ *
+ * Returns: 0
+ */
+
+static unsigned int nolock_unlock(lm_lock_t *lock, unsigned int cur_state)
+{
+   return 0;
+}
+
+static void nolock_cancel(lm_lock_t *lock)
+{
+}
+
+/**
+ * nolock_hold_lvb - hold on to a lock value block
+ * @lock: the lock the LVB is associated with
+ * @lvbp: return the lm_lvb_t here
+ *
+ * Returns: 0 on success, -EXXX on failure
+ */
+
+static int nolock_hold_lvb(lm_lock_t *lock, char **lvbp)
+{
+   struct nolock_lockspace *nl = (struct nolock_lockspace *)lock;
+   int error = 0;
+
+   *lvbp = kmalloc(nl->nl_lvb_size, GFP_KERNEL);
+   if (*lvbp)
+   memset(*lvbp, 0, nl->nl_lvb_size);
+   else
+   error = -ENOMEM;
+
+   return error;
+}
+
+/**
+ * nolock_unhold_lvb - release a LVB
+ * @lock: the lock the LVB is associated with
+ * @lvb: the lock value block
+ *
+ */
+
+static void nolock_unhold_lvb(lm_lock_t *lock, char *lvb)
+{
+   kfree(lvb);
+}
+
+/**
+ * nolock_sync_lvb - sync out the value

[PATCH 10/13] GFS: build and documentation

2005-09-01 Thread David Teigland
Add gfs to the build system and gfs2.txt to Documentation.

Signed-off-by: Ken Preslan <[EMAIL PROTECTED]>
Signed-off-by: David Teigland <[EMAIL PROTECTED]>

---

 Documentation/filesystems/gfs2.txt |  194 +
 fs/Kconfig |   15 ++
 fs/Makefile|1 
 fs/gfs2/Makefile   |   45 
 4 files changed, 255 insertions(+)

--- a/fs/gfs2/Makefile  1970-01-01 07:30:00.0 +0730
+++ b/fs/gfs2/Makefile  2005-09-01 17:36:55.572076408 +0800
@@ -0,0 +1,45 @@
+obj-$(CONFIG_GFS2_FS) += gfs2.o
+gfs2-y := \
+   acl.o \
+   bits.o \
+   bmap.o \
+   daemon.o \
+   dir.o \
+   eaops.o \
+   eattr.o \
+   glock.o \
+   glops.o \
+   inode.o \
+   ioctl.o \
+   jdata.o \
+   lm.o \
+   log.o \
+   lops.o \
+   lvb.o \
+   main.o \
+   meta_io.o \
+   mount.o \
+   ondisk.o \
+   ops_address.o \
+   ops_dentry.o \
+   ops_export.o \
+   ops_file.o \
+   ops_fstype.o \
+   ops_inode.o \
+   ops_super.o \
+   ops_vm.o \
+   page.o \
+   quota.o \
+   resize.o \
+   recovery.o \
+   rgrp.o \
+   super.o \
+   sys.o \
+   trans.o \
+   unlinked.o \
+   util.o
+
+obj-$(CONFIG_GFS2_FS) += locking/harness/
+obj-$(CONFIG_GFS2_FS) += locking/nolock/
+obj-$(CONFIG_GFS2_FS) += locking/dlm/
+
--- a/fs/Makefile   2005-09-01 16:59:28.042752800 +0800
+++ b/fs/Makefile   2005-09-01 17:10:11.211976216 +0800
@@ -105,3 +105,4 @@
 obj-$(CONFIG_OCFS2_FS) += ocfs2/
 obj-$(CONFIG_RELAYFS_FS)   += relayfs/
 obj-$(CONFIG_9P_FS)+= 9p/
+obj-$(CONFIG_GFS2_FS)  += gfs2/
--- a/fs/Kconfig2005-09-01 16:59:28.038753408 +0800
+++ b/fs/Kconfig2005-09-01 17:09:39.810749928 +0800
@@ -360,6 +360,21 @@
  - POSIX ACLs
  - readpages / writepages (not user visible)
 
+config GFS2_FS
+   tristate "GFS2 file system support"
+   depends on DLM
+   select FS_POSIX_ACL
+   help
+   A cluster filesystem.
+
+   Allows a cluster of computers to simultaneously use a block device
+   that is shared between them (with FC, iSCSI, NBD, etc...).  GFS reads
+   and writes to the block device like a local filesystem, but also uses
+   a lock module to allow the computers coordinate their I/O so
+   filesystem consistency is maintained.  One of the nifty features of
+   GFS is perfect consistency -- changes made to the filesystem on one
+   machine show up immediately on all other machines in the cluster.
+
 config MINIX_FS
tristate "Minix fs support"
help
--- a/Documentation/filesystems/gfs2.txt1970-01-01 07:30:00.0 
+0730
+++ b/Documentation/filesystems/gfs2.txt2005-09-01 17:36:55.593073216 
+0800
@@ -0,0 +1,194 @@
+Global File System
+--
+
+http://sources.redhat.com/cluster/
+
+GFS is a cluster file system. It allows a cluster of computers to
+simultaneously use a block device that is shared between them (with FC,
+iSCSI, NBD, etc).  GFS reads and writes to the block device like a local
+file system, but also uses a lock module to allow the computers coordinate
+their I/O so file system consistency is maintained.  One of the nifty
+features of GFS is perfect consistency -- changes made to the file system
+on one machine show up immediately on all other machines in the cluster.
+
+GFS uses interchangable inter-node locking mechanisms.  GFS plugs into one
+side of a module called "lock_harness" and different lock modules can plug
+into the other side of the harness.  Each gfs file system selects the
+appropriate lock module at mount time.  Lock modules include:
+
+  lock_nolock -- does no real locking and allows gfs to be used as a
+  local file system
+
+  lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking
+  The dlm is found at linux/drivers/dlm/
+
+In addition to interfacing with an external locking manager, a gfs lock
+module is responsible for interacting with external cluster management
+systems.  Lock_dlm depends on user space cluster management systems found
+at the location above.
+
+To use gfs as a local file system, no external clustering systems are
+needed, simply:
+
+  $ gfs2_mkfs -p lock_nolock -j 1 /dev/block_device
+  $ mount -t gfs2 /dev/block_device /dir
+
+GFS2 is not on-disk compatible with previous versions of GFS.
+
+
+The following man pages can be found at the location above:
+  gfs2_mkfsto make a filesystem
+  gfs2_fsckto repair a filesystem
+  gfs2_growto expand a filesystem online
+  gfs2_jaddto add journals to a filesystem online
+  gfs2_toolto manipulate, examine and tune a filesystem
+  gfs2_quota   to examine and change quota values in a filesystem
+  gfs2_mount   to find mount options
+
+Mount options (from the gfs2_mount man page)
+
+   lockproto=LockModuleName

[PATCH 05/13] GFS: ea and acl

2005-09-01 Thread David Teigland
Code that handles extended attributes and ACL's.

Signed-off-by: Ken Preslan <[EMAIL PROTECTED]>
Signed-off-by: David Teigland <[EMAIL PROTECTED]>

---

 fs/gfs2/acl.c   |  313 ++
 fs/gfs2/acl.h   |   37 +
 fs/gfs2/eaops.c |  179 ++
 fs/gfs2/eaops.h |   30 +
 fs/gfs2/eattr.c | 1621 
 fs/gfs2/eattr.h |   91 +++
 6 files changed, 2271 insertions(+)

--- a/fs/gfs2/acl.c 1970-01-01 07:30:00.0 +0730
+++ b/fs/gfs2/acl.c 2005-09-01 17:36:55.135142832 +0800
@@ -0,0 +1,313 @@
+/*
+ * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
+ * Copyright (C) 2004-2005 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "gfs2.h"
+#include "acl.h"
+#include "eaops.h"
+#include "eattr.h"
+#include "glock.h"
+#include "inode.h"
+#include "meta_io.h"
+#include "trans.h"
+
+#define ACL_ACCESS 1
+#define ACL_DEFAULT 0
+
+int gfs2_acl_validate_set(struct gfs2_inode *ip, int access,
+ struct gfs2_ea_request *er,
+ int *remove, mode_t *mode)
+{
+   struct posix_acl *acl;
+   int error;
+
+   error = gfs2_acl_validate_remove(ip, access);
+   if (error)
+   return error;
+
+   if (!er->er_data)
+   return -EINVAL;
+
+   acl = posix_acl_from_xattr(er->er_data, er->er_data_len);
+   if (IS_ERR(acl))
+   return PTR_ERR(acl);
+   if (!acl) {
+   *remove = TRUE;
+   return 0;
+   }
+
+   error = posix_acl_valid(acl);
+   if (error)
+   goto out;
+
+   if (access) {
+   error = posix_acl_equiv_mode(acl, mode);
+   if (!error)
+   *remove = TRUE;
+   else if (error > 0)
+   error = 0;
+   }
+
+ out:
+   posix_acl_release(acl);
+
+   return error;
+}
+
+int gfs2_acl_validate_remove(struct gfs2_inode *ip, int access)
+{
+   if (!ip->i_sbd->sd_args.ar_posix_acl)
+   return -EOPNOTSUPP;
+   if (current->fsuid != ip->i_di.di_uid && !capable(CAP_FOWNER))
+   return -EPERM;
+   if (S_ISLNK(ip->i_di.di_mode))
+   return -EOPNOTSUPP;
+   if (!access && !S_ISDIR(ip->i_di.di_mode))
+   return -EACCES;
+
+   return 0;
+}
+
+static int acl_get(struct gfs2_inode *ip, int access, struct posix_acl **acl,
+  struct gfs2_ea_location *el, char **data, unsigned int *len)
+{
+   struct gfs2_ea_request er;
+   struct gfs2_ea_location el_this;
+   int error;
+
+   if (!ip->i_di.di_eattr)
+   return 0;
+
+   memset(&er, 0, sizeof(struct gfs2_ea_request));
+   if (access) {
+   er.er_name = GFS2_POSIX_ACL_ACCESS;
+   er.er_name_len = GFS2_POSIX_ACL_ACCESS_LEN;
+   } else {
+   er.er_name = GFS2_POSIX_ACL_DEFAULT;
+   er.er_name_len = GFS2_POSIX_ACL_DEFAULT_LEN;
+   }
+   er.er_type = GFS2_EATYPE_SYS;
+
+   if (!el)
+   el = &el_this;
+
+   error = gfs2_ea_find(ip, &er, el);
+   if (error)
+   return error;
+   if (!el->el_ea)
+   return 0;
+   if (!GFS2_EA_DATA_LEN(el->el_ea))
+   goto out;
+
+   er.er_data_len = GFS2_EA_DATA_LEN(el->el_ea);
+   er.er_data = kmalloc(er.er_data_len, GFP_KERNEL);
+   error = -ENOMEM;
+   if (!er.er_data)
+   goto out;
+
+   error = gfs2_ea_get_copy(ip, el, er.er_data);
+   if (error)
+   goto out_kfree;
+
+   if (acl) {
+   *acl = posix_acl_from_xattr(er.er_data, er.er_data_len);
+   if (IS_ERR(*acl))
+   error = PTR_ERR(*acl);
+   }
+
+ out_kfree:
+   if (error || !data)
+   kfree(er.er_data);
+   else {
+   *data = er.er_data;
+   *len = er.er_data_len;
+   }
+
+ out:
+   if (error || el == &el_this)
+   brelse(el->el_bh);
+
+   return error;
+}
+
+/**
+ * gfs2_check_acl_locked - Check an ACL to see if we're allowed to do something
+ * @inode: the file we want to do something to
+ * @mask: what we want to do
+ *
+ * Returns: errno
+ */
+
+int gfs2_check_acl_locked(struct inode *inode, int mask)
+{
+   struct posix_acl *acl = NULL;
+   int error;
+
+   error = acl_get(get_v2ip(inode), ACL_ACCESS, &acl, NULL, NULL, NULL);
+   if (error)
+   return error;
+
+   if (acl) {
+   error = posix_acl_permission(inode, acl, mask);
+   posix_acl_release(acl);
+   return error;
+   }
+
+   return -EAGAIN;
+}
+
+int gfs2_check_ac

[PATCH 06/13] GFS: logging and recovery

2005-09-01 Thread David Teigland
A per-node on-disk log is used for recovery.

Signed-off-by: Ken Preslan <[EMAIL PROTECTED]>
Signed-off-by: David Teigland <[EMAIL PROTECTED]>

---

 fs/gfs2/log.c  |  670 +
 fs/gfs2/log.h  |   68 +
 fs/gfs2/recovery.c |  561 
 fs/gfs2/recovery.h |   32 ++
 4 files changed, 1331 insertions(+)

--- a/fs/gfs2/log.c 1970-01-01 07:30:00.0 +0730
+++ b/fs/gfs2/log.c 2005-09-01 17:36:55.338111976 +0800
@@ -0,0 +1,670 @@
+/*
+ * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
+ * Copyright (C) 2004-2005 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "gfs2.h"
+#include "bmap.h"
+#include "glock.h"
+#include "log.h"
+#include "lops.h"
+#include "meta_io.h"
+
+#define PULL 1
+
+static inline int is_done(struct gfs2_sbd *sdp, atomic_t *a)
+{
+   int done;
+   gfs2_log_lock(sdp);
+   done = atomic_read(a) ? FALSE : TRUE;
+   gfs2_log_unlock(sdp);
+   return done;
+}
+
+static void do_lock_wait(struct gfs2_sbd *sdp, wait_queue_head_t *wq,
+atomic_t *a)
+{
+   gfs2_log_unlock(sdp);
+   wait_event(*wq, is_done(sdp, a));
+   gfs2_log_lock(sdp);
+}
+
+static void lock_for_trans(struct gfs2_sbd *sdp)
+{
+   gfs2_log_lock(sdp);
+   do_lock_wait(sdp, &sdp->sd_log_trans_wq, &sdp->sd_log_flush_count);
+   atomic_inc(&sdp->sd_log_trans_count);
+   gfs2_log_unlock(sdp);
+}
+
+static void unlock_from_trans(struct gfs2_sbd *sdp)
+{
+   gfs2_assert_warn(sdp, atomic_read(&sdp->sd_log_trans_count));
+   if (atomic_dec_and_test(&sdp->sd_log_trans_count))
+   wake_up(&sdp->sd_log_flush_wq);
+}
+
+void gfs2_lock_for_flush(struct gfs2_sbd *sdp)
+{
+   gfs2_log_lock(sdp);
+   atomic_inc(&sdp->sd_log_flush_count);
+   do_lock_wait(sdp, &sdp->sd_log_flush_wq, &sdp->sd_log_trans_count);
+   gfs2_log_unlock(sdp);
+}
+
+void gfs2_unlock_from_flush(struct gfs2_sbd *sdp)
+{
+   gfs2_assert_warn(sdp, atomic_read(&sdp->sd_log_flush_count));
+   if (atomic_dec_and_test(&sdp->sd_log_flush_count))
+   wake_up(&sdp->sd_log_trans_wq);
+}
+
+/**
+ * gfs2_struct2blk - compute stuff
+ * @sdp: the filesystem
+ * @nstruct: the number of structures
+ * @ssize: the size of the structures
+ *
+ * Compute the number of log descriptor blocks needed to hold a certain number
+ * of structures of a certain size.
+ *
+ * Returns: the number of blocks needed (minimum is always 1)
+ */
+
+unsigned int gfs2_struct2blk(struct gfs2_sbd *sdp, unsigned int nstruct,
+unsigned int ssize)
+{
+   unsigned int blks;
+   unsigned int first, second;
+
+   blks = 1;
+   first = (sdp->sd_sb.sb_bsize - sizeof(struct gfs2_log_descriptor)) / 
ssize;
+
+   if (nstruct > first) {
+   second = (sdp->sd_sb.sb_bsize - sizeof(struct 
gfs2_meta_header)) / ssize;
+   blks += DIV_RU(nstruct - first, second);
+   }
+
+   return blks;
+}
+
+void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags)
+{
+   struct list_head *head = &sdp->sd_ail1_list;
+   uint64_t sync_gen;
+   struct list_head *first, *tmp;
+   struct gfs2_ail *first_ai, *ai;
+
+   gfs2_log_lock(sdp);
+   if (list_empty(head)) {
+   gfs2_log_unlock(sdp);
+   return;
+   }
+   sync_gen = sdp->sd_ail_sync_gen++;
+
+   first = head->prev;
+   first_ai = list_entry(first, struct gfs2_ail, ai_list);
+   first_ai->ai_sync_gen = sync_gen;
+   gfs2_ail1_start_one(sdp, first_ai);
+
+   if (flags & DIO_ALL)
+   first = NULL;
+
+   for (;;) {
+   if (first &&
+   (head->prev != first ||
+gfs2_ail1_empty_one(sdp, first_ai, 0)))
+   break;
+
+   for (tmp = head->prev; tmp != head; tmp = tmp->prev) {
+   ai = list_entry(tmp, struct gfs2_ail, ai_list);
+   if (ai->ai_sync_gen >= sync_gen)
+   continue;
+   ai->ai_sync_gen = sync_gen;
+   gfs2_ail1_start_one(sdp, ai);
+   break;
+   }
+
+   if (tmp == head)
+   break;
+   }
+
+   gfs2_log_unlock(sdp);
+}
+
+int gfs2_ail1_empty(struct gfs2_sbd *sdp, int flags)
+{
+   struct list_head *head, *tmp, *prev;
+   struct gfs2_ail *ai;
+   int ret;
+
+   gfs2_log_lock(sdp);
+
+   for (head = &sdp->sd_ail1_list, tmp = head->prev, prev = tmp->prev;
+tmp != head;
+tmp = prev, prev = tmp->prev) {
+ 

[PATCH 07/13] GFS: quotas

2005-09-01 Thread David Teigland
Code that deals with quotas.

Signed-off-by: Ken Preslan <[EMAIL PROTECTED]>
Signed-off-by: David Teigland <[EMAIL PROTECTED]>

---

 fs/gfs2/lvb.c   |   61 ++
 fs/gfs2/lvb.h   |   28 +
 fs/gfs2/quota.c | 1209 
 fs/gfs2/quota.h |   34 +
 4 files changed, 1332 insertions(+)

--- a/fs/gfs2/quota.c   1970-01-01 07:30:00.0 +0730
+++ b/fs/gfs2/quota.c   2005-09-01 17:36:55.443096016 +0800
@@ -0,0 +1,1209 @@
+/*
+ * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
+ * Copyright (C) 2004-2005 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "gfs2.h"
+#include "bmap.h"
+#include "glock.h"
+#include "glops.h"
+#include "jdata.h"
+#include "log.h"
+#include "meta_io.h"
+#include "quota.h"
+#include "rgrp.h"
+#include "super.h"
+#include "trans.h"
+
+#define QUOTA_USER 1
+#define QUOTA_GROUP 0
+
+static uint64_t qd2offset(struct gfs2_quota_data *qd)
+{
+   uint64_t offset;
+
+   offset = 2 * (uint64_t)qd->qd_id + !test_bit(QDF_USER, &qd->qd_flags);
+   offset *= sizeof(struct gfs2_quota);
+
+   return offset;
+}
+
+static int qd_alloc(struct gfs2_sbd *sdp, int user, uint32_t id,
+   struct gfs2_quota_data **qdp)
+{
+   struct gfs2_quota_data *qd;
+   int error;
+
+   qd = kzalloc(sizeof(struct gfs2_quota_data), GFP_KERNEL);
+   if (!qd)
+   return -ENOMEM;
+
+   qd->qd_count = 1;
+   qd->qd_id = id;
+   if (user)
+   set_bit(QDF_USER, &qd->qd_flags);
+   qd->qd_slot = -1;
+
+   error = gfs2_glock_get(sdp, 2 * (uint64_t)id + !user,
+ &gfs2_quota_glops, CREATE, &qd->qd_gl);
+   if (error)
+   goto fail;
+
+   error = gfs2_lvb_hold(qd->qd_gl);
+   gfs2_glock_put(qd->qd_gl);
+   if (error)
+   goto fail;
+
+   *qdp = qd;
+
+   return 0;
+
+ fail:
+   kfree(qd);
+   return error;
+}
+
+static int qd_get(struct gfs2_sbd *sdp, int user, uint32_t id, int create,
+ struct gfs2_quota_data **qdp)
+{
+   struct gfs2_quota_data *qd = NULL, *new_qd = NULL;
+   int error, found;
+
+   *qdp = NULL;
+
+   for (;;) {
+   found = FALSE;
+   spin_lock(&sdp->sd_quota_spin);
+   list_for_each_entry(qd, &sdp->sd_quota_list, qd_list) {
+   if (qd->qd_id == id &&
+   !test_bit(QDF_USER, &qd->qd_flags) == !user) {
+   qd->qd_count++;
+   found = TRUE;
+   break;
+   }
+   }
+
+   if (!found)
+   qd = NULL;
+
+   if (!qd && new_qd) {
+   qd = new_qd;
+   list_add(&qd->qd_list, &sdp->sd_quota_list);
+   atomic_inc(&sdp->sd_quota_count);
+   new_qd = NULL;
+   }
+
+   spin_unlock(&sdp->sd_quota_spin);
+
+   if (qd || !create) {
+   if (new_qd) {
+   gfs2_lvb_unhold(new_qd->qd_gl);
+   kfree(new_qd);
+   }
+   *qdp = qd;
+   return 0;
+   }
+
+   error = qd_alloc(sdp, user, id, &new_qd);
+   if (error)
+   return error;
+   }
+}
+
+static void qd_hold(struct gfs2_quota_data *qd)
+{
+   struct gfs2_sbd *sdp = qd->qd_gl->gl_sbd;
+
+   spin_lock(&sdp->sd_quota_spin);
+   gfs2_assert(sdp, qd->qd_count,);
+   qd->qd_count++;
+   spin_unlock(&sdp->sd_quota_spin);
+}
+
+static void qd_put(struct gfs2_quota_data *qd)
+{
+   struct gfs2_sbd *sdp = qd->qd_gl->gl_sbd;
+   spin_lock(&sdp->sd_quota_spin);
+   gfs2_assert(sdp, qd->qd_count,);
+   if (!--qd->qd_count)
+   qd->qd_last_touched = jiffies;
+   spin_unlock(&sdp->sd_quota_spin);
+}
+
+static int slot_get(struct gfs2_quota_data *qd)
+{
+   struct gfs2_sbd *sdp = qd->qd_gl->gl_sbd;
+   unsigned int c, o = 0, b;
+   unsigned char byte = 0;
+
+   spin_lock(&sdp->sd_quota_spin);
+
+   if (qd->qd_slot_count++) {
+   spin_unlock(&sdp->sd_quota_spin);
+   return 0;
+   }
+
+   for (c = 0; c < sdp->sd_quota_chunks; c++)
+   for (o = 0; o < PAGE_SIZE; o++) {
+   byte = sdp->sd_quota_bitmap[c][o];
+   if (byte != 0xFF)
+   goto found;
+   }
+
+   goto fail;
+
+ found:
+   for (b = 0; b < 8;

[PATCH 13/13] GFS: lock_dlm module

2005-09-01 Thread David Teigland
The lock_dlm module uses the DLM in linux/drivers/dlm/ for inter-node
locking.

Signed-off-by: Ken Preslan <[EMAIL PROTECTED]>
Signed-off-by: David Teigland <[EMAIL PROTECTED]>

---

 fs/gfs2/locking/dlm/Makefile   |3 
 fs/gfs2/locking/dlm/lock.c |  533 +
 fs/gfs2/locking/dlm/lock_dlm.h |  200 +++
 fs/gfs2/locking/dlm/main.c |   62 
 fs/gfs2/locking/dlm/mount.c|  218 
 fs/gfs2/locking/dlm/plock.c|  274 +
 fs/gfs2/locking/dlm/sysfs.c|  283 +
 fs/gfs2/locking/dlm/thread.c   |  355 +++
 include/linux/lock_dlm_plock.h |   40 +++
 9 files changed, 1968 insertions(+)

diff -urpN a/fs/gfs2/locking/dlm/Makefile b/fs/gfs2/locking/dlm/Makefile
--- a/fs/gfs2/locking/dlm/Makefile  1970-01-01 07:30:00.0 +0730
+++ b/fs/gfs2/locking/dlm/Makefile  2005-09-01 17:48:48.143749048 +0800
@@ -0,0 +1,3 @@
+obj-$(CONFIG_GFS2_FS) += lock_dlm.o
+lock_dlm-y := lock.o main.o mount.o sysfs.o thread.o plock.o
+
diff -urpN a/fs/gfs2/locking/dlm/lock.c b/fs/gfs2/locking/dlm/lock.c
--- a/fs/gfs2/locking/dlm/lock.c1970-01-01 07:30:00.0 +0730
+++ b/fs/gfs2/locking/dlm/lock.c2005-09-01 17:48:48.139749656 +0800
@@ -0,0 +1,533 @@
+/*
+ * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
+ * Copyright (C) 2004-2005 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ */
+
+#include "lock_dlm.h"
+
+static char junk_lvb[GDLM_LVB_SIZE];
+
+static void queue_complete(struct gdlm_lock *lp)
+{
+   struct gdlm_ls *ls = lp->ls;
+
+   clear_bit(LFL_ACTIVE, &lp->flags);
+
+   spin_lock(&ls->async_lock);
+   list_add_tail(&lp->clist, &ls->complete);
+   spin_unlock(&ls->async_lock);
+   wake_up(&ls->thread_wait);
+}
+
+static inline void gdlm_ast(void *astarg)
+{
+   queue_complete((struct gdlm_lock *) astarg);
+}
+
+static inline void gdlm_bast(void *astarg, int mode)
+{
+   struct gdlm_lock *lp = astarg;
+   struct gdlm_ls *ls = lp->ls;
+
+   if (!mode) {
+   printk("lock_dlm: bast mode zero %x,%"PRIx64"\n",
+   lp->lockname.ln_type, lp->lockname.ln_number);
+   return;
+   }
+
+   spin_lock(&ls->async_lock);
+   if (!lp->bast_mode) {
+   list_add_tail(&lp->blist, &ls->blocking);
+   lp->bast_mode = mode;
+   } else if (lp->bast_mode < mode)
+   lp->bast_mode = mode;
+   spin_unlock(&ls->async_lock);
+   wake_up(&ls->thread_wait);
+}
+
+void gdlm_queue_delayed(struct gdlm_lock *lp)
+{
+   struct gdlm_ls *ls = lp->ls;
+
+   spin_lock(&ls->async_lock);
+   list_add_tail(&lp->delay_list, &ls->delayed);
+   spin_unlock(&ls->async_lock);
+}
+
+/* convert gfs lock-state to dlm lock-mode */
+
+static int16_t make_mode(int16_t lmstate)
+{
+   switch (lmstate) {
+   case LM_ST_UNLOCKED:
+   return DLM_LOCK_NL;
+   case LM_ST_EXCLUSIVE:
+   return DLM_LOCK_EX;
+   case LM_ST_DEFERRED:
+   return DLM_LOCK_CW;
+   case LM_ST_SHARED:
+   return DLM_LOCK_PR;
+   default:
+   GDLM_ASSERT(0, printk("unknown LM state %d\n", lmstate););
+   }
+}
+
+/* convert dlm lock-mode to gfs lock-state */
+
+int16_t gdlm_make_lmstate(int16_t dlmmode)
+{
+   switch (dlmmode) {
+   case DLM_LOCK_IV:
+   case DLM_LOCK_NL:
+   return LM_ST_UNLOCKED;
+   case DLM_LOCK_EX:
+   return LM_ST_EXCLUSIVE;
+   case DLM_LOCK_CW:
+   return LM_ST_DEFERRED;
+   case DLM_LOCK_PR:
+   return LM_ST_SHARED;
+   default:
+   GDLM_ASSERT(0, printk("unknown DLM mode %d\n", dlmmode););
+   }
+}
+
+/* verify agreement with GFS on the current lock state, NB: DLM_LOCK_NL and
+   DLM_LOCK_IV are both considered LM_ST_UNLOCKED by GFS. */
+
+static void check_cur_state(struct gdlm_lock *lp, unsigned int cur_state)
+{
+   int16_t cur = make_mode(cur_state);
+   if (lp->cur != DLM_LOCK_IV)
+   GDLM_ASSERT(lp->cur == cur, printk("%d, %d\n", lp->cur, cur););
+}
+
+static inline unsigned int make_flags(struct gdlm_lock *lp,
+ unsigned int gfs_flags,
+ int16_t cur, int16_t req)
+{
+   unsigned int lkf = 0;
+
+   if (gfs_flags & LM_FLAG_TRY)
+   lkf |= DLM_LKF_NOQUEUE;
+
+   if (gfs_flags & LM_FLAG_TRY_1CB) {
+   lkf |= DLM_LKF_NOQUEUE;
+   lkf |= DLM_LKF_NOQUEUEBAST;
+   }
+
+   if (gfs_flags & LM_FLAG_PRIORITY) {
+   lkf |= DLM_LKF_NOORDER;
+   lkf |= DLM_LKF_HEADQUE;
+   }
+
+   if (gfs_flags & LM_FLAG_ANY) {
+

[PATCH 08/13] GFS: mount and tuning options

2005-09-01 Thread David Teigland
There are a variety of mount options, tunable parameters, internal
statistics, and methods of online file system manipulation.

Signed-off-by: Ken Preslan <[EMAIL PROTECTED]>
Signed-off-by: David Teigland <[EMAIL PROTECTED]>

---

 fs/gfs2/ioctl.c  | 1485 +++
 fs/gfs2/ioctl.h  |   15 
 fs/gfs2/mount.c  |  209 +++
 fs/gfs2/mount.h  |   15 
 fs/gfs2/resize.c |  285 ++
 fs/gfs2/resize.h |   19 
 fs/gfs2/sys.c|  201 +++
 fs/gfs2/sys.h|   24 
 8 files changed, 2253 insertions(+)

--- a/fs/gfs2/ioctl.c   1970-01-01 07:30:00.0 +0730
+++ b/fs/gfs2/ioctl.c   2005-09-01 17:36:55.321114560 +0800
@@ -0,0 +1,1485 @@
+/*
+ * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
+ * Copyright (C) 2004-2005 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "gfs2.h"
+#include "bmap.h"
+#include "dir.h"
+#include "eattr.h"
+#include "glock.h"
+#include "glops.h"
+#include "inode.h"
+#include "ioctl.h"
+#include "jdata.h"
+#include "log.h"
+#include "meta_io.h"
+#include "quota.h"
+#include "resize.h"
+#include "rgrp.h"
+#include "super.h"
+#include "trans.h"
+
+typedef int (*gi_filler_t) (struct gfs2_inode *ip,
+   struct gfs2_ioctl *gi,
+   char *buf,
+   unsigned int size,
+   unsigned int *count);
+
+#define ARG_SIZE 32
+
+/**
+ * gi_skeleton - Setup a buffer that functions can print into
+ * @ip:
+ * @gi:
+ * @filler:
+ *
+ * Returns: -errno or count of bytes copied to userspace
+ */
+
+static int gi_skeleton(struct gfs2_inode *ip, struct gfs2_ioctl *gi,
+  gi_filler_t filler)
+{
+   unsigned int size = gfs2_tune_get(ip->i_sbd, gt_lockdump_size);
+   char *buf;
+   unsigned int count = 0;
+   int error;
+
+   if (size > gi->gi_size)
+   size = gi->gi_size;
+
+   buf = kmalloc(size, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   error = filler(ip, gi, buf, size, &count);
+   if (error)
+   goto out;
+
+   if (copy_to_user(gi->gi_data, buf, count + 1))
+   error = -EFAULT;
+   else
+   error = count + 1;
+
+ out:
+   kfree(buf);
+
+   return error;
+}
+
+/**
+ * gi_get_cookie - Return the "cookie" (identifying string) for a
+ *  filesystem mount
+ * @ip:
+ * @gi:
+ * @buf:
+ * @size:
+ * @count:
+ *
+ * Returns: errno
+ */
+
+static int gi_get_cookie(struct gfs2_inode *ip, struct gfs2_ioctl *gi,
+char *buf, unsigned int size, unsigned int *count)
+{
+   int error = -ENOBUFS;
+
+   if (gi->gi_argc != 1)
+   return -EINVAL;
+
+   gfs2_printf("version 0\n");
+   gfs2_printf("%lu", (unsigned long)ip->i_sbd);
+
+   error = 0;
+
+ out:
+   return error;
+}
+
+/**
+ * gi_get_super - Return the "struct gfs2_sb" for a filesystem
+ * @sdp:
+ * @gi:
+ *
+ * Returns: errno
+ */
+
+static int gi_get_super(struct gfs2_sbd *sdp, struct gfs2_ioctl *gi)
+{
+   struct gfs2_holder sb_gh;
+   struct buffer_head *bh;
+   struct gfs2_sb *sb;
+   int error;
+
+   if (gi->gi_argc != 1)
+   return -EINVAL;
+   if (gi->gi_size != sizeof(struct gfs2_sb))
+   return -EINVAL;
+
+   sb = kmalloc(sizeof(struct gfs2_sb), GFP_KERNEL);
+   if (!sb)
+   return -ENOMEM;
+
+   error = gfs2_glock_nq_num(sdp,
+GFS2_SB_LOCK, &gfs2_meta_glops,
+LM_ST_SHARED, 0, &sb_gh);
+   if (error)
+   goto out;
+
+   error = gfs2_meta_read(sb_gh.gh_gl,
+  GFS2_SB_ADDR >> sdp->sd_fsb2bb_shift,
+  DIO_START | DIO_WAIT,
+  &bh);
+   if (error) {
+   gfs2_glock_dq_uninit(&sb_gh);
+   goto out;
+   }
+   gfs2_sb_in(sb, bh->b_data);
+   brelse(bh);
+
+   gfs2_glock_dq_uninit(&sb_gh);
+
+   if (copy_to_user(gi->gi_data, sb,
+sizeof(struct gfs2_sb)))
+   error = -EFAULT;
+   else
+   error = sizeof(struct gfs2_sb);
+
+ out:
+   kfree(sb);
+
+   return error;
+}
+
+/**
+ * gi_get_args - Return the mount arguments
+ * @ip:
+ * @gi:
+ * @buf:
+ * @size:
+ * @count:
+ *
+ * Returns: errno
+ */
+
+static int gi_get_args(struct gfs2_inode *ip, struct gfs2_ioctl *gi,
+  char *buf, unsigned int size, unsigned int *count)
+{
+   struct gfs2_sbd *sdp = ip->i_sbd;
+   struct gfs2_args *args = &sdp->sd_args;
+   int error = -ENOBUFS

Re: GFS, what's remaining

2005-09-01 Thread Pekka Enberg
On 9/1/05, David Teigland <[EMAIL PROTECTED]> wrote:
> - Adapt the vfs so gfs (and other cfs's) don't need to walk vma lists.
>   [cf. ops_file.c:walk_vm(), gfs works fine as is, but some don't like it.]

It works fine only if you don't care about playing well with other
clustered filesystems.

  Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GFS, what's remaining

2005-09-01 Thread Arjan van de Ven
On Thu, 2005-09-01 at 18:46 +0800, David Teigland wrote:
> Hi, this is the latest set of gfs patches, it includes some minor munging
> since the previous set.  Andrew, could this be added to -mm? there's not
> much in the way of pending changes.
> 
> http://redhat.com/~teigland/gfs2/20050901/gfs2-full.patch
> http://redhat.com/~teigland/gfs2/20050901/broken-out/

+static inline void glock_put(struct gfs2_glock *gl)
+{
+   if (atomic_read(&gl->gl_count) == 1)
+   gfs2_glock_schedule_for_reclaim(gl);
+   gfs2_assert(gl->gl_sbd, atomic_read(&gl->gl_count) > 0,);
+   atomic_dec(&gl->gl_count);
+}

this code has a race

what is gfs2_assert() about anyway? please just use BUG_ON directly everywhere

+static inline int queue_empty(struct gfs2_glock *gl, struct list_head *head)
+{
+   int empty;
+   spin_lock(&gl->gl_spin);
+   empty = list_empty(head);
+   spin_unlock(&gl->gl_spin);
+   return empty;
+}

that looks like a racey interface to me... if so.. why bother locking at all?
+void gfs2_glock_hold(struct gfs2_glock *gl)
+{
+   glock_hold(gl);
+}

eh why?

+struct gfs2_holder *gfs2_holder_get(struct gfs2_glock *gl, unsigned int state,
+   int flags, int gfp_flags)
+{
+   struct gfs2_holder *gh;
+
+   gh = kmalloc(sizeof(struct gfs2_holder), GFP_KERNEL | gfp_flags);

this looks odd. Either you take flags or you don't.. this looks really half 
arsed and thus is really surprising 
to all callers


static int gi_skeleton(struct gfs2_inode *ip, struct gfs2_ioctl *gi,
+  gi_filler_t filler)
+{
+   unsigned int size = gfs2_tune_get(ip->i_sbd, gt_lockdump_size);
+   char *buf;
+   unsigned int count = 0;
+   int error;
+
+   if (size > gi->gi_size)
+   size = gi->gi_size;
+
+   buf = kmalloc(size, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   error = filler(ip, gi, buf, size, &count);
+   if (error)
+   goto out;
+
+   if (copy_to_user(gi->gi_data, buf, count + 1))
+   error = -EFAULT;

where does count get a sensible value?

+static unsigned int handle_roll(atomic_t *a)
+{
+   int x = atomic_read(a);
+   if (x < 0) {
+   atomic_set(a, 0);
+   return 0;
+   }
+   return (unsigned int)x;
+}

this is just plain scary.


you'll have to post the rest of your patches if you want anyone to look at 
them...



-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GFS, what's remaining

2005-09-01 Thread Andrew Morton
David Teigland <[EMAIL PROTECTED]> wrote:
>
> Hi, this is the latest set of gfs patches, it includes some minor munging
>  since the previous set.  Andrew, could this be added to -mm?

Dumb question: why?

Maybe I was asleep, but I don't recall seeing much discussion or exposition
of

- Why the kernel needs two clustered fileystems

- Why GFS is better than OCFS2, or has functionality which OCFS2 cannot
  possibly gain (or vice versa)

- Relative merits of the two offerings

etc.

Maybe this has all been thrashed out and agreed to.  If so, please remind me.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GFS, what's remaining

2005-09-01 Thread Arjan van de Ven
On Thu, 2005-09-01 at 18:46 +0800, David Teigland wrote:
> Hi, this is the latest set of gfs patches, it includes some minor munging
> since the previous set.  Andrew, could this be added to -mm? there's not
> much in the way of pending changes.

can you post them here instead so that they can be actually reviewed?



-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


GFS, what's remaining

2005-09-01 Thread David Teigland
Hi, this is the latest set of gfs patches, it includes some minor munging
since the previous set.  Andrew, could this be added to -mm? there's not
much in the way of pending changes.

http://redhat.com/~teigland/gfs2/20050901/gfs2-full.patch
http://redhat.com/~teigland/gfs2/20050901/broken-out/

I'd like to get a list of specific things remaining for merging.  I
believe we've responded to everything from earlier reviews, they were very
helpful and more would be excellent.  The list begins with one item from
before that's still pending:

- Adapt the vfs so gfs (and other cfs's) don't need to walk vma lists.
  [cf. ops_file.c:walk_vm(), gfs works fine as is, but some don't like it.]
...

Thanks
Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html