Re: files of size larger than fs size
>The problem appears to be mixing calls to lseek64 with calls to fread >and fwrite. Oh, of course. I didn't see that. You can't use the file descriptor of a file that is opened as a stream. This test case uses the fileno() function to mess with the internals of the stream. fseeko64() is the proper function to position a stream. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: files of size larger than fs size
On Thu, 2005-03-17 at 13:48 -0800, Max wrote: > Dave, > > Shouldn't "fread(&data,sizeof(data),1,f)" and "read(fn, &data, > sizeof(data))" produce identical results? > Is it a bug or what? fseek(f, offset, SEEK_SET); fread(&data, sizeof(data, 1, f); should produce identical results to lseek(fn, offset, SEEK_SET); read(fn, &data, sizeof(data); However lseek(fn, offset, SEEK_SET); fread(&data, sizeof(data, 1, f); isn't valid. -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] NFS: add I/O performance counters
[EMAIL PROTECTED] (Chuck Lever) wrote: > > +static inline void nfs_inc_stats(struct inode *inode, unsigned int stat) > +{ > + struct nfs_iostats *iostats = NFS_SERVER(inode)->io_stats; > + iostats[smp_processor_id()].counts[stat]++; > +} The use of smp_processor_id() outside locks should spit a runtime warning. And it is racy: if you switch CPUs between the read and the write (via preemption), the stats will be corrupted. A preempt_disable()/enable() will fix those things up. > +static inline struct nfs_iostats *nfs_alloc_iostats(void) > +{ > + struct nfs_iostats *new; > + new = kmalloc(sizeof(struct nfs_iostats) * NR_CPUS, GFP_KERNEL); > + if (new) > + memset(new, 0, sizeof(struct nfs_iostats) * NR_CPUS); > + return new; > +} > + You'd be better off using alloc_percpu() here, so each CPU's counter goes into its node-local memory. Or simply use . AFACIT the warning at the top of that file isn't true any more. A 4-byte counter on a 32-way should consume just a little over 256 bytes. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: files of size larger than fs size
Dave, Shouldn't "fread(&data,sizeof(data),1,f)" and "read(fn, &data, sizeof(data))" produce identical results? Is it a bug or what? Max Dave Kleikamp wrote: With this change, the file size on jfs becomes 2^48 + 4 as expected. --- jfs_bug.c.orig 2005-03-17 14:18:48.229634648 -0600 +++ jfs_bug.c 2005-03-17 15:32:45.952750104 -0600 @@ -13,12 +13,14 @@ int data = 0; struct flock fl; void read1() { -size_t rc = fread(&data,sizeof(data),1,f); +/* size_t rc = fread(&data,sizeof(data),1,f); */ +size_t rc = read(fn, &data, sizeof(data)); printf("read() rc = %llu\n",rc); } void write1() { -size_t rc = fwrite(&data,sizeof(data),1,f); +/* size_t rc = fwrite(&data,sizeof(data),1,f); */ +size_t rc = write(fn, &data, sizeof(data)); printf("write() rc = %llu\n",rc); } - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: files of size larger than fs size
With this change, the file size on jfs becomes 2^48 + 4 as expected. --- jfs_bug.c.orig 2005-03-17 14:18:48.229634648 -0600 +++ jfs_bug.c 2005-03-17 15:32:45.952750104 -0600 @@ -13,12 +13,14 @@ int data = 0; struct flock fl; void read1() { -size_t rc = fread(&data,sizeof(data),1,f); +/* size_t rc = fread(&data,sizeof(data),1,f); */ +size_t rc = read(fn, &data, sizeof(data)); printf("read() rc = %llu\n",rc); } void write1() { -size_t rc = fwrite(&data,sizeof(data),1,f); +/* size_t rc = fwrite(&data,sizeof(data),1,f); */ +size_t rc = write(fn, &data, sizeof(data)); printf("write() rc = %llu\n",rc); } -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: files of size larger than fs size
On Thu, 2005-03-17 at 14:51 -0600, Dave Kleikamp wrote: > On Thu, 2005-03-17 at 12:06 -0800, Bryan Henderson wrote: > > >I found > > >that for larger values, your test program is returning -1, but unsigned > > >it appears as 18446744073709551615. > > > > You mean you ran it? Then what about the more interesting question of > > what your filesize ends up to be? You say JFS allows files up to 2**52 > > bytes, so I expect the test case would succeed up through the write at > > 2**48 and leave the filesize 2**48 + 8. But Max reports seeing 2**48 - > > 4080. > > Yeah, this is funny. I missed that part of the story. I'll try to > figure out what is going on here. The problem appears to be mixing calls to lseek64 with calls to fread and fwrite. fread & fwrite don't appear to honor the position set by lseek64. I assume replacing fread & fwrite with read & write will fix it. I haven't tried it yet, but I'm about to. -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: files of size larger than fs size
On Thu, 2005-03-17 at 12:06 -0800, Bryan Henderson wrote: > >I found > >that for larger values, your test program is returning -1, but unsigned > >it appears as 18446744073709551615. > > You mean you ran it? Then what about the more interesting question of > what your filesize ends up to be? You say JFS allows files up to 2**52 > bytes, so I expect the test case would succeed up through the write at > 2**48 and leave the filesize 2**48 + 8. But Max reports seeing 2**48 - > 4080. Yeah, this is funny. I missed that part of the story. I'll try to figure out what is going on here. > It's conceivable that the reporting of the filesize is wrong, by the way. jfs_debugfs tells me the same thing. So ls has it right. Thanks, Shaggy -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: files of size larger than fs size
>I found >that for larger values, your test program is returning -1, but unsigned >it appears as 18446744073709551615. You mean you ran it? Then what about the more interesting question of what your filesize ends up to be? You say JFS allows files up to 2**52 bytes, so I expect the test case would succeed up through the write at 2**48 and leave the filesize 2**48 + 8. But Max reports seeing 2**48 - 4080. It's conceivable that the reporting of the filesize is wrong, by the way. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] NFS: add I/O performance counters
Add an extensible per-superblock performance counter facility to the NFS client. This facility mimics the counters available for block devices and for networking. Expose these new counters via /proc/self/mountstats. Version: Mon, 14 Mar 2005 17:06:12 -0500 Signed-off-by: Chuck Lever <[EMAIL PROTECTED]> --- fs/nfs/dir.c |8 ++ fs/nfs/direct.c|5 + fs/nfs/file.c | 20 +++-- fs/nfs/inode.c | 126 +++-- fs/nfs/pagelist.c | 12 ++- fs/nfs/read.c |7 ++ fs/nfs/write.c | 10 ++ include/linux/nfs_fs_sb.h |5 + include/linux/nfs_iostat.h | 80 +++ 9 files changed, 256 insertions(+), 17 deletions(-) diff -X /home/cel/src/linux/dont-diff -Naurp 01-mountstats/fs/nfs/dir.c 02-nfs-iostat/fs/nfs/dir.c --- 01-mountstats/fs/nfs/dir.c 2005-03-02 02:38:09.0 -0500 +++ 02-nfs-iostat/fs/nfs/dir.c 2005-03-14 15:28:34.011484000 -0500 @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -428,6 +429,8 @@ static int nfs_readdir(struct file *filp lock_kernel(); + nfs_inc_stats(inode, NFS_VFS_GETDENTS); + res = nfs_revalidate_inode(NFS_SERVER(inode), inode); if (res < 0) { unlock_kernel(); @@ -584,6 +587,7 @@ static int nfs_lookup_revalidate(struct parent = dget_parent(dentry); lock_kernel(); dir = parent->d_inode; + nfs_inc_stats(dir, NFS_DENTRY_REVALIDATE); inode = dentry->d_inode; if (nd && !(nd->flags & LOOKUP_CONTINUE) && (nd->flags & LOOKUP_OPEN)) @@ -712,6 +716,7 @@ static struct dentry *nfs_lookup(struct dfprintk(VFS, "NFS: lookup(%s/%s)\n", dentry->d_parent->d_name.name, dentry->d_name.name); + nfs_inc_stats(dir, NFS_VFS_LOOKUP); res = ERR_PTR(-ENAMETOOLONG); if (dentry->d_name.len > NFS_SERVER(dir)->namelen) @@ -1116,6 +1121,7 @@ static int nfs_sillyrename(struct inode dfprintk(VFS, "NFS: silly-rename(%s/%s, ct=%d)\n", dentry->d_parent->d_name.name, dentry->d_name.name, atomic_read(&dentry->d_count)); + nfs_inc_stats(dir, NFS_SILLY_RENAME); #ifdef NFS_PARANOIA if (!dentry->d_inode) @@ -1500,6 +1506,8 @@ int nfs_permission(struct inode *inode, struct rpc_cred *cred; int res; + nfs_inc_stats(inode, NFS_VFS_ACCESS); + if (mask == 0) return 0; diff -X /home/cel/src/linux/dont-diff -Naurp 01-mountstats/fs/nfs/direct.c 02-nfs-iostat/fs/nfs/direct.c --- 01-mountstats/fs/nfs/direct.c 2005-03-02 02:38:25.0 -0500 +++ 02-nfs-iostat/fs/nfs/direct.c 2005-03-14 15:26:16.401349000 -0500 @@ -47,6 +47,7 @@ #include #include +#include #include #include @@ -354,6 +355,8 @@ static ssize_t nfs_direct_read_seg(struc result = nfs_direct_read_wait(dreq, clnt->cl_intr); rpc_clnt_sigunmask(clnt, &oldset); + nfs_add_stats(inode, NFS_WIRE_READ_BYTES, result); + nfs_add_stats(inode, NFS_DIRECT_READ_BYTES, result); return result; } @@ -576,6 +579,8 @@ static ssize_t nfs_direct_write(struct i if (result < size) break; } + nfs_add_stats(inode, NFS_WIRE_WRITTEN_BYTES, tot_bytes); + nfs_add_stats(inode, NFS_DIRECT_WRITTEN_BYTES, tot_bytes); return tot_bytes; } diff -X /home/cel/src/linux/dont-diff -Naurp 01-mountstats/fs/nfs/file.c 02-nfs-iostat/fs/nfs/file.c --- 01-mountstats/fs/nfs/file.c 2005-03-02 02:38:38.0 -0500 +++ 02-nfs-iostat/fs/nfs/file.c 2005-03-14 15:42:52.446804000 -0500 @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -86,18 +87,15 @@ static int nfs_check_flags(int flags) static int nfs_file_open(struct inode *inode, struct file *filp) { - struct nfs_server *server = NFS_SERVER(inode); - int (*open)(struct inode *, struct file *); int res; res = nfs_check_flags(filp->f_flags); if (res) return res; + nfs_inc_stats(inode, NFS_VFS_OPEN); lock_kernel(); - /* Do NFSv4 open() call */ - if ((open = server->rpc_ops->file_open) != NULL) - res = open(inode, filp); + res = NFS_SERVER(inode)->rpc_ops->file_open(inode, filp); unlock_kernel(); return res; } @@ -105,6 +103,7 @@ nfs_file_open(struct inode *inode, struc static int nfs_file_release(struct inode *inode, struct file *filp) { + nfs_inc_stats(inode, NFS_VFS_CLOSE); return NFS_PROTO(inode)->file_release(inode, filp); } @@ -123,6 +122,7 @@ nfs_file_flush(struct file *file) if ((file->f_mode & FMODE_WRITE) == 0) return 0; + nfs_inc_stats(inode, NFS_VFS_FLUSH); lock_kernel(); /* Ensure that data+attribute caches are up to date after cl
[PATCH 1/2] VFS: New /proc file /proc/self/mountstats
Create a new file under /proc/self, called mountstats, where mounted file systems can export information (configuration options, performance counters, and so on). Use a mechanism similar to /proc/mounts and s_ops->show_options. This mechanism does not violate namespace security, and is safe to use while other processes are unmounting file systems. Version: Mon, 14 Mar 2005 17:06:04 -0500 Signed-off-by: Chuck Lever <[EMAIL PROTECTED]> --- fs/namespace.c | 66 + fs/proc/base.c | 40 +++ include/linux/fs.h |1 3 files changed, 107 insertions(+) diff -X /home/cel/src/linux/dont-diff -Naurp 00-stock/fs/namespace.c 01-mountstats/fs/namespace.c --- 00-stock/fs/namespace.c 2005-03-02 02:38:13.0 -0500 +++ 01-mountstats/fs/namespace.c2005-03-14 15:24:51.565085000 -0500 @@ -265,6 +265,72 @@ struct seq_operations mounts_op = { .show = show_vfsmnt }; +/* iterator */ +static void *ms_start(struct seq_file *m, loff_t *pos) +{ + struct namespace *n = m->private; + struct list_head *p; + loff_t l = *pos; + + down_read(&n->sem); + list_for_each(p, &n->list) + if (!l--) + return list_entry(p, struct vfsmount, mnt_list); + return NULL; +} + +static void *ms_next(struct seq_file *m, void *v, loff_t *pos) +{ + struct namespace *n = m->private; + struct list_head *p = ((struct vfsmount *)v)->mnt_list.next; + (*pos)++; + return p==&n->list ? NULL : list_entry(p, struct vfsmount, mnt_list); +} + +static void ms_stop(struct seq_file *m, void *v) +{ + struct namespace *n = m->private; + up_read(&n->sem); +} + +static int show_vfsstat(struct seq_file *m, void *v) +{ + struct vfsmount *mnt = v; + int err = 0; + + /* device */ + if (mnt->mnt_devname) { + seq_puts(m, "device "); + mangle(m, mnt->mnt_devname); + } else + seq_puts(m, "no device"); + + /* mount point */ + seq_puts(m, " mounted on "); + seq_path(m, mnt, mnt->mnt_root, " \t\n\\"); + seq_putc(m, ' '); + + /* file system type */ + seq_puts(m, "with fstype "); + mangle(m, mnt->mnt_sb->s_type->name); + + /* optional statistics */ + if (mnt->mnt_sb->s_op->show_stats) { + seq_putc(m, ' '); + err = mnt->mnt_sb->s_op->show_stats(m, mnt); + } + + seq_putc(m, '\n'); + return err; +} + +struct seq_operations mountstats_op = { + .start = ms_start, + .next = ms_next, + .stop = ms_stop, + .show = show_vfsstat, +}; + /** * may_umount_tree - check if a mount tree is busy * @mnt: root of mount tree diff -X /home/cel/src/linux/dont-diff -Naurp 00-stock/fs/proc/base.c 01-mountstats/fs/proc/base.c --- 00-stock/fs/proc/base.c 2005-03-02 02:38:12.0 -0500 +++ 01-mountstats/fs/proc/base.c2005-03-14 15:24:51.571085000 -0500 @@ -60,6 +60,7 @@ enum pid_directory_inos { PROC_TGID_STATM, PROC_TGID_MAPS, PROC_TGID_MOUNTS, + PROC_TGID_MOUNTSTATS, PROC_TGID_WCHAN, #ifdef CONFIG_SCHEDSTATS PROC_TGID_SCHEDSTAT, @@ -91,6 +92,7 @@ enum pid_directory_inos { PROC_TID_STATM, PROC_TID_MAPS, PROC_TID_MOUNTS, + PROC_TID_MOUNTSTATS, PROC_TID_WCHAN, #ifdef CONFIG_SCHEDSTATS PROC_TID_SCHEDSTAT, @@ -134,6 +136,7 @@ static struct pid_entry tgid_base_stuff[ E(PROC_TGID_ROOT, "root",S_IFLNK|S_IRWXUGO), E(PROC_TGID_EXE, "exe", S_IFLNK|S_IRWXUGO), E(PROC_TGID_MOUNTS,"mounts", S_IFREG|S_IRUGO), + E(PROC_TGID_MOUNTSTATS, "mountstats", S_IFREG|S_IRUGO), #ifdef CONFIG_SECURITY E(PROC_TGID_ATTR, "attr",S_IFDIR|S_IRUGO|S_IXUGO), #endif @@ -164,6 +167,7 @@ static struct pid_entry tid_base_stuff[] E(PROC_TID_ROOT, "root",S_IFLNK|S_IRWXUGO), E(PROC_TID_EXE,"exe", S_IFLNK|S_IRWXUGO), E(PROC_TID_MOUNTS, "mounts", S_IFREG|S_IRUGO), + E(PROC_TID_MOUNTSTATS, "mountstats", S_IFREG|S_IRUGO), #ifdef CONFIG_SECURITY E(PROC_TID_ATTR, "attr",S_IFDIR|S_IRUGO|S_IXUGO), #endif @@ -528,6 +532,38 @@ static struct file_operations proc_mount .release= mounts_release, }; +extern struct seq_operations mountstats_op; +static int mountstats_open(struct inode *inode, struct file *file) +{ + struct task_struct *task = proc_task(inode); + int ret = seq_open(file, &mountstats_op); + + if (!ret) { + struct seq_file *m = file->private_data; + struct namespace *namespace; + task_lock(task); + namespace = task->namespace; + if (namespace) + get_namespace(namespace); + task_unlock(task); + + if (namespace
[PATCH 0/2] RFC: exporting per-superblock statistics to user space
We still have a need to provide "iostat" like statistics for NFS clients. Following are a couple of patches, against 2.6.11.3, which prototype an approach for providing this kind of data to user programs. I'd like some comment on the approach. 01-mountstats.patch adds a new file called /proc/self/mountstats and a new file system method called show_stats. this just replicates /proc/mounts and the show_options hook. 02-nfs-iostat.patch teachs the NFS client to use the new show_stats hook as a demonstration. Note that this approach addresses previously voiced concerns about exporting per-superblock stats to user space. 1. Processes can't see stats for file systems mounted outside their namespace. 2. Reading the stats file is serialized with mount and unmount operations. 3. The approach doesn't use /sys or kobjects. 4. There are no lifetime issues tied to file systems loaded as a module. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: files of size larger than fs size
On Wed, 2005-03-16 at 20:52 -0800, Max wrote: > Bryan, > > I'm not experienced in filesystems. But I've derived the testcase > program from some buggy application that occasionally created huge > files on my fs. That was not so easy to reproduce since not every > sequence of seeks/reads/writes results in a huge file. But finally I > got it 100% reproducible. > > So I would appriciate if somebody make any good from that testcase > code. man lseek tells you to test the return code against (off_t)-1. I found that for larger values, your test program is returning -1, but unsigned it appears as 18446744073709551615. > > Different filesizes look strange to me. What I have so far: > > JFS:281474976706576 = 2**48 - 4080 > XFS: 72057594037923856 = 2**56 - 4080 > EXT3: 1099511627784 = 2**40 + 8 I can speak for jfs, and it supports file sizes up to 2**52. It uses 40 bits to address the logical block offset within a file with a 4K (2**12) size block. The individual file systems set sb->s_maxbytes to the appropriate value. > Thanks, > Max > > P.S. direct link to the testcase program: > http://bugzilla.kernel.org/attachment.cgi?id=4729 -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html