Re: files of size larger than fs size

2005-03-17 Thread Bryan Henderson
>The problem appears to be mixing calls to lseek64 with calls to fread
>and fwrite.

Oh, of course.  I didn't see that.  You can't use the file descriptor of a 
file that is opened as a stream.  This test case uses the fileno() 
function to mess with the internals of the stream.

fseeko64() is the proper function to position a stream.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: files of size larger than fs size

2005-03-17 Thread Dave Kleikamp
On Thu, 2005-03-17 at 13:48 -0800, Max wrote:
> Dave,
> 
> Shouldn't "fread(&data,sizeof(data),1,f)" and "read(fn, &data, 
> sizeof(data))" produce identical results?
> Is it a bug or what?

fseek(f, offset, SEEK_SET);
fread(&data, sizeof(data, 1, f);

should produce identical results to

lseek(fn, offset, SEEK_SET);
read(fn, &data, sizeof(data);

However

lseek(fn, offset, SEEK_SET);
fread(&data, sizeof(data, 1, f);

isn't valid.

-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] NFS: add I/O performance counters

2005-03-17 Thread Andrew Morton
[EMAIL PROTECTED] (Chuck Lever) wrote:
>
> +static inline void nfs_inc_stats(struct inode *inode, unsigned int stat)
> +{
> + struct nfs_iostats *iostats = NFS_SERVER(inode)->io_stats;
> + iostats[smp_processor_id()].counts[stat]++;
> +}

The use of smp_processor_id() outside locks should spit a runtime warning. 
And it is racy: if you switch CPUs between the read and the write (via
preemption), the stats will be corrupted.

A preempt_disable()/enable() will fix those things up.

> +static inline struct nfs_iostats *nfs_alloc_iostats(void)
> +{
> + struct nfs_iostats *new;
> + new = kmalloc(sizeof(struct nfs_iostats) * NR_CPUS, GFP_KERNEL);
> + if (new)
> + memset(new, 0, sizeof(struct nfs_iostats) * NR_CPUS);
> + return new;
> +}
> +

You'd be better off using alloc_percpu() here, so each CPU's counter goes
into its node-local memory.

Or simply use .  AFACIT the warning at the top of
that file isn't true any more.  A 4-byte counter on a 32-way should consume
just a little over 256 bytes.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: files of size larger than fs size

2005-03-17 Thread Max
Dave,
Shouldn't "fread(&data,sizeof(data),1,f)" and "read(fn, &data, 
sizeof(data))" produce identical results?
Is it a bug or what?

Max
Dave Kleikamp wrote:
With this change, the file size on jfs becomes 2^48 + 4 as expected.
--- jfs_bug.c.orig  2005-03-17 14:18:48.229634648 -0600
+++ jfs_bug.c   2005-03-17 15:32:45.952750104 -0600
@@ -13,12 +13,14 @@ int data = 0;
struct flock fl;
void read1() {
-size_t rc = fread(&data,sizeof(data),1,f);
+/* size_t rc = fread(&data,sizeof(data),1,f); */
+size_t rc = read(fn, &data, sizeof(data));
printf("read() rc = %llu\n",rc);
}
void write1() {
-size_t rc = fwrite(&data,sizeof(data),1,f);
+/* size_t rc = fwrite(&data,sizeof(data),1,f); */
+size_t rc = write(fn, &data, sizeof(data));
printf("write() rc = %llu\n",rc);
}
 

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: files of size larger than fs size

2005-03-17 Thread Dave Kleikamp
With this change, the file size on jfs becomes 2^48 + 4 as expected.

--- jfs_bug.c.orig  2005-03-17 14:18:48.229634648 -0600
+++ jfs_bug.c   2005-03-17 15:32:45.952750104 -0600
@@ -13,12 +13,14 @@ int data = 0;
 struct flock fl;
 
 void read1() {
-size_t rc = fread(&data,sizeof(data),1,f);
+/* size_t rc = fread(&data,sizeof(data),1,f); */
+size_t rc = read(fn, &data, sizeof(data));
 printf("read() rc = %llu\n",rc);
 }
 
 void write1() {
-size_t rc = fwrite(&data,sizeof(data),1,f);
+/* size_t rc = fwrite(&data,sizeof(data),1,f); */
+size_t rc = write(fn, &data, sizeof(data));
 printf("write() rc = %llu\n",rc);
 }
 

-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: files of size larger than fs size

2005-03-17 Thread Dave Kleikamp
On Thu, 2005-03-17 at 14:51 -0600, Dave Kleikamp wrote:
> On Thu, 2005-03-17 at 12:06 -0800, Bryan Henderson wrote:
> > >I found
> > >that for larger values, your test program is returning -1, but unsigned
> > >it appears as 18446744073709551615.
> > 
> > You mean you ran it?  Then what about the more interesting question of 
> > what your filesize ends up to be?  You say JFS allows files up to 2**52 
> > bytes, so I expect the test case would succeed up through the write at 
> > 2**48 and leave the filesize 2**48 + 8.  But Max reports seeing 2**48 - 
> > 4080.
> 
> Yeah, this is funny.  I missed that part of the story.  I'll try to
> figure out what is going on here.

The problem appears to be mixing calls to lseek64 with calls to fread
and fwrite.  fread & fwrite don't appear to honor the position set by
lseek64.  I assume replacing fread & fwrite with read & write will fix
it.  I haven't tried it yet, but I'm about to.

-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: files of size larger than fs size

2005-03-17 Thread Dave Kleikamp
On Thu, 2005-03-17 at 12:06 -0800, Bryan Henderson wrote:
> >I found
> >that for larger values, your test program is returning -1, but unsigned
> >it appears as 18446744073709551615.
> 
> You mean you ran it?  Then what about the more interesting question of 
> what your filesize ends up to be?  You say JFS allows files up to 2**52 
> bytes, so I expect the test case would succeed up through the write at 
> 2**48 and leave the filesize 2**48 + 8.  But Max reports seeing 2**48 - 
> 4080.

Yeah, this is funny.  I missed that part of the story.  I'll try to
figure out what is going on here.

> It's conceivable that the reporting of the filesize is wrong, by the way.

jfs_debugfs tells me the same thing.  So ls has it right.

Thanks,
Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: files of size larger than fs size

2005-03-17 Thread Bryan Henderson
>I found
>that for larger values, your test program is returning -1, but unsigned
>it appears as 18446744073709551615.

You mean you ran it?  Then what about the more interesting question of 
what your filesize ends up to be?  You say JFS allows files up to 2**52 
bytes, so I expect the test case would succeed up through the write at 
2**48 and leave the filesize 2**48 + 8.  But Max reports seeing 2**48 - 
4080.

It's conceivable that the reporting of the filesize is wrong, by the way.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] NFS: add I/O performance counters

2005-03-17 Thread Chuck Lever
 Add an extensible per-superblock performance counter facility to the NFS
 client.  This facility mimics the counters available for block devices and
 for networking.

 Expose these new counters via /proc/self/mountstats.

 Version: Mon, 14 Mar 2005 17:06:12 -0500
 
 Signed-off-by: Chuck Lever <[EMAIL PROTECTED]>
---
 
 fs/nfs/dir.c   |8 ++
 fs/nfs/direct.c|5 +
 fs/nfs/file.c  |   20 +++--
 fs/nfs/inode.c |  126 +++--
 fs/nfs/pagelist.c  |   12 ++-
 fs/nfs/read.c  |7 ++
 fs/nfs/write.c |   10 ++
 include/linux/nfs_fs_sb.h  |5 +
 include/linux/nfs_iostat.h |   80 +++
 9 files changed, 256 insertions(+), 17 deletions(-)
 
 
diff -X /home/cel/src/linux/dont-diff -Naurp 01-mountstats/fs/nfs/dir.c 
02-nfs-iostat/fs/nfs/dir.c
--- 01-mountstats/fs/nfs/dir.c  2005-03-02 02:38:09.0 -0500
+++ 02-nfs-iostat/fs/nfs/dir.c  2005-03-14 15:28:34.011484000 -0500
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -428,6 +429,8 @@ static int nfs_readdir(struct file *filp
 
lock_kernel();
 
+   nfs_inc_stats(inode, NFS_VFS_GETDENTS);
+
res = nfs_revalidate_inode(NFS_SERVER(inode), inode);
if (res < 0) {
unlock_kernel();
@@ -584,6 +587,7 @@ static int nfs_lookup_revalidate(struct 
parent = dget_parent(dentry);
lock_kernel();
dir = parent->d_inode;
+   nfs_inc_stats(dir, NFS_DENTRY_REVALIDATE);
inode = dentry->d_inode;
 
if (nd && !(nd->flags & LOOKUP_CONTINUE) && (nd->flags & LOOKUP_OPEN))
@@ -712,6 +716,7 @@ static struct dentry *nfs_lookup(struct 
 
dfprintk(VFS, "NFS: lookup(%s/%s)\n",
dentry->d_parent->d_name.name, dentry->d_name.name);
+   nfs_inc_stats(dir, NFS_VFS_LOOKUP);
 
res = ERR_PTR(-ENAMETOOLONG);
if (dentry->d_name.len > NFS_SERVER(dir)->namelen)
@@ -1116,6 +1121,7 @@ static int nfs_sillyrename(struct inode 
dfprintk(VFS, "NFS: silly-rename(%s/%s, ct=%d)\n",
dentry->d_parent->d_name.name, dentry->d_name.name, 
atomic_read(&dentry->d_count));
+   nfs_inc_stats(dir, NFS_SILLY_RENAME);
 
 #ifdef NFS_PARANOIA
 if (!dentry->d_inode)
@@ -1500,6 +1506,8 @@ int nfs_permission(struct inode *inode, 
struct rpc_cred *cred;
int res;
 
+   nfs_inc_stats(inode, NFS_VFS_ACCESS);
+
if (mask == 0)
return 0;
 
diff -X /home/cel/src/linux/dont-diff -Naurp 01-mountstats/fs/nfs/direct.c 
02-nfs-iostat/fs/nfs/direct.c
--- 01-mountstats/fs/nfs/direct.c   2005-03-02 02:38:25.0 -0500
+++ 02-nfs-iostat/fs/nfs/direct.c   2005-03-14 15:26:16.401349000 -0500
@@ -47,6 +47,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 
@@ -354,6 +355,8 @@ static ssize_t nfs_direct_read_seg(struc
result = nfs_direct_read_wait(dreq, clnt->cl_intr);
rpc_clnt_sigunmask(clnt, &oldset);
 
+   nfs_add_stats(inode, NFS_WIRE_READ_BYTES, result);
+   nfs_add_stats(inode, NFS_DIRECT_READ_BYTES, result);
return result;
 }
 
@@ -576,6 +579,8 @@ static ssize_t nfs_direct_write(struct i
if (result < size)
break;
}
+   nfs_add_stats(inode, NFS_WIRE_WRITTEN_BYTES, tot_bytes);
+   nfs_add_stats(inode, NFS_DIRECT_WRITTEN_BYTES, tot_bytes);
return tot_bytes;
 }
 
diff -X /home/cel/src/linux/dont-diff -Naurp 01-mountstats/fs/nfs/file.c 
02-nfs-iostat/fs/nfs/file.c
--- 01-mountstats/fs/nfs/file.c 2005-03-02 02:38:38.0 -0500
+++ 02-nfs-iostat/fs/nfs/file.c 2005-03-14 15:42:52.446804000 -0500
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -86,18 +87,15 @@ static int nfs_check_flags(int flags)
 static int
 nfs_file_open(struct inode *inode, struct file *filp)
 {
-   struct nfs_server *server = NFS_SERVER(inode);
-   int (*open)(struct inode *, struct file *);
int res;
 
res = nfs_check_flags(filp->f_flags);
if (res)
return res;
 
+   nfs_inc_stats(inode, NFS_VFS_OPEN);
lock_kernel();
-   /* Do NFSv4 open() call */
-   if ((open = server->rpc_ops->file_open) != NULL)
-   res = open(inode, filp);
+   res = NFS_SERVER(inode)->rpc_ops->file_open(inode, filp);
unlock_kernel();
return res;
 }
@@ -105,6 +103,7 @@ nfs_file_open(struct inode *inode, struc
 static int
 nfs_file_release(struct inode *inode, struct file *filp)
 {
+   nfs_inc_stats(inode, NFS_VFS_CLOSE);
return NFS_PROTO(inode)->file_release(inode, filp);
 }
 
@@ -123,6 +122,7 @@ nfs_file_flush(struct file *file)
 
if ((file->f_mode & FMODE_WRITE) == 0)
return 0;
+   nfs_inc_stats(inode, NFS_VFS_FLUSH);
lock_kernel();
/* Ensure that data+attribute caches are up to date after cl

[PATCH 1/2] VFS: New /proc file /proc/self/mountstats

2005-03-17 Thread Chuck Lever
 Create a new file under /proc/self, called mountstats, where mounted file
 systems can export information (configuration options, performance counters,
 and so on).  Use a mechanism similar to /proc/mounts and s_ops->show_options.

 This mechanism does not violate namespace security, and is safe to use while
 other processes are unmounting file systems.

 Version: Mon, 14 Mar 2005 17:06:04 -0500
 
 Signed-off-by: Chuck Lever <[EMAIL PROTECTED]>
---
 
 fs/namespace.c |   66 +
 fs/proc/base.c |   40 +++
 include/linux/fs.h |1 
 3 files changed, 107 insertions(+)
 
 
diff -X /home/cel/src/linux/dont-diff -Naurp 00-stock/fs/namespace.c 
01-mountstats/fs/namespace.c
--- 00-stock/fs/namespace.c 2005-03-02 02:38:13.0 -0500
+++ 01-mountstats/fs/namespace.c2005-03-14 15:24:51.565085000 -0500
@@ -265,6 +265,72 @@ struct seq_operations mounts_op = {
.show   = show_vfsmnt
 };
 
+/* iterator */
+static void *ms_start(struct seq_file *m, loff_t *pos)
+{
+   struct namespace *n = m->private;
+   struct list_head *p;
+   loff_t l = *pos;
+
+   down_read(&n->sem);
+   list_for_each(p, &n->list)
+   if (!l--)
+   return list_entry(p, struct vfsmount, mnt_list);
+   return NULL;
+}
+
+static void *ms_next(struct seq_file *m, void *v, loff_t *pos)
+{
+   struct namespace *n = m->private;
+   struct list_head *p = ((struct vfsmount *)v)->mnt_list.next;
+   (*pos)++;
+   return p==&n->list ? NULL : list_entry(p, struct vfsmount, mnt_list);
+}
+
+static void ms_stop(struct seq_file *m, void *v)
+{
+   struct namespace *n = m->private;
+   up_read(&n->sem);
+}
+
+static int show_vfsstat(struct seq_file *m, void *v)
+{
+   struct vfsmount *mnt = v;
+   int err = 0;
+
+   /* device */
+   if (mnt->mnt_devname) {
+   seq_puts(m, "device ");
+   mangle(m, mnt->mnt_devname);
+   } else
+   seq_puts(m, "no device");
+
+   /* mount point */
+   seq_puts(m, " mounted on ");
+   seq_path(m, mnt, mnt->mnt_root, " \t\n\\");
+   seq_putc(m, ' ');
+
+   /* file system type */
+   seq_puts(m, "with fstype ");
+   mangle(m, mnt->mnt_sb->s_type->name);
+
+   /* optional statistics */
+   if (mnt->mnt_sb->s_op->show_stats) {
+   seq_putc(m, ' ');
+   err = mnt->mnt_sb->s_op->show_stats(m, mnt);
+   }
+
+   seq_putc(m, '\n');
+   return err;
+}
+
+struct seq_operations mountstats_op = {
+   .start  = ms_start,
+   .next   = ms_next,
+   .stop   = ms_stop,
+   .show   = show_vfsstat,
+};
+
 /**
  * may_umount_tree - check if a mount tree is busy
  * @mnt: root of mount tree
diff -X /home/cel/src/linux/dont-diff -Naurp 00-stock/fs/proc/base.c 
01-mountstats/fs/proc/base.c
--- 00-stock/fs/proc/base.c 2005-03-02 02:38:12.0 -0500
+++ 01-mountstats/fs/proc/base.c2005-03-14 15:24:51.571085000 -0500
@@ -60,6 +60,7 @@ enum pid_directory_inos {
PROC_TGID_STATM,
PROC_TGID_MAPS,
PROC_TGID_MOUNTS,
+   PROC_TGID_MOUNTSTATS,
PROC_TGID_WCHAN,
 #ifdef CONFIG_SCHEDSTATS
PROC_TGID_SCHEDSTAT,
@@ -91,6 +92,7 @@ enum pid_directory_inos {
PROC_TID_STATM,
PROC_TID_MAPS,
PROC_TID_MOUNTS,
+   PROC_TID_MOUNTSTATS,
PROC_TID_WCHAN,
 #ifdef CONFIG_SCHEDSTATS
PROC_TID_SCHEDSTAT,
@@ -134,6 +136,7 @@ static struct pid_entry tgid_base_stuff[
E(PROC_TGID_ROOT,  "root",S_IFLNK|S_IRWXUGO),
E(PROC_TGID_EXE,   "exe", S_IFLNK|S_IRWXUGO),
E(PROC_TGID_MOUNTS,"mounts",  S_IFREG|S_IRUGO),
+   E(PROC_TGID_MOUNTSTATS, "mountstats", S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -164,6 +167,7 @@ static struct pid_entry tid_base_stuff[]
E(PROC_TID_ROOT,   "root",S_IFLNK|S_IRWXUGO),
E(PROC_TID_EXE,"exe", S_IFLNK|S_IRWXUGO),
E(PROC_TID_MOUNTS, "mounts",  S_IFREG|S_IRUGO),
+   E(PROC_TID_MOUNTSTATS, "mountstats", S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -528,6 +532,38 @@ static struct file_operations proc_mount
.release= mounts_release,
 };
 
+extern struct seq_operations mountstats_op;
+static int mountstats_open(struct inode *inode, struct file *file)
+{
+   struct task_struct *task = proc_task(inode);
+   int ret = seq_open(file, &mountstats_op);
+
+   if (!ret) {
+   struct seq_file *m = file->private_data;
+   struct namespace *namespace;
+   task_lock(task);
+   namespace = task->namespace;
+   if (namespace)
+   get_namespace(namespace);
+   task_unlock(task);
+
+   if (namespace

[PATCH 0/2] RFC: exporting per-superblock statistics to user space

2005-03-17 Thread Chuck Lever
 We still have a need to provide "iostat" like statistics for NFS
 clients.  Following are a couple of patches, against 2.6.11.3, which
 prototype an approach for providing this kind of data to user programs.

 I'd like some comment on the approach.

 01-mountstats.patch adds a new file called /proc/self/mountstats and a
 new file system method called show_stats.  this just replicates
 /proc/mounts and the show_options hook.

 02-nfs-iostat.patch teachs the NFS client to use the new show_stats
 hook as a demonstration.

 Note that this approach addresses previously voiced concerns about
 exporting per-superblock stats to user space.

 1. Processes can't see stats for file systems mounted outside their
namespace.

 2. Reading the stats file is serialized with mount and unmount
operations.

 3. The approach doesn't use /sys or kobjects.

 4. There are no lifetime issues tied to file systems loaded as a
module.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: files of size larger than fs size

2005-03-17 Thread Dave Kleikamp
On Wed, 2005-03-16 at 20:52 -0800, Max wrote:
> Bryan,
> 
> I'm not experienced in filesystems. But I've derived the testcase
> program from some buggy application that occasionally created huge
> files on my fs. That was not so easy to reproduce since not every
> sequence of seeks/reads/writes results in a huge file. But finally I
> got it 100% reproducible.
> 
> So I would appriciate if somebody make any good from that testcase
> code.

man lseek tells you to test the return code against (off_t)-1.  I found
that for larger values, your test program is returning -1, but unsigned
it appears as 18446744073709551615.

> 
> Different filesizes look strange to me. What I have so far:
> 
> JFS:281474976706576  =  2**48 - 4080
> XFS:  72057594037923856  =  2**56 - 4080
> EXT3: 1099511627784  =  2**40 + 8

I can speak for jfs, and it supports file sizes up to 2**52.  It uses 40
bits to address the logical block offset within a file with a 4K (2**12)
size block.  The individual file systems set sb->s_maxbytes to the
appropriate value.

> Thanks,
> Max
> 
> P.S. direct link to the testcase program: 
> http://bugzilla.kernel.org/attachment.cgi?id=4729

-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html