2.3.99-pre1 VFS comments

2000-03-19 Thread Erez Zadok

Hi Al,

Ion and I worked on updating our stackable templates for 2.3.99-pre1 for the
past few days.  We have found various oddities and other possible problems.
We promised to report on anything interesting we find wrt the VFS, so here
it is.  We are willing to test and submit patches for anything below that
you think is worth it.

(1) Asymmetry b/t double_lock and double_unlock:

Only double_unlock does dput() on the two dentries.  The only place where
double_lock is called is in do_rename, and do_rename already calls
get_parent() which increments the reference counts.  We can simplify the
code and make it symmetric by moving the two get_parent() calls into
double_lock().

(2) vfs_readlink:

It would be nice if all vfs_ were essentially wrappers that did some
checking and then called the file system specific method.  This isn't the
case for vfs_readlink.  (BTW, we like the vfs_* routines very much!)

(3) "__" routines:

In fs/namei.c, vfs_follow_link simply calls __vfs_follow_link with the same,
unchanged args.  Can't we simplify and get rid of the __vfs_follow_link
routine?  Then at least in page_follow_link, it should call the
vfs_follow_link directly.

(4) permission:

fs/namei.c:permission() should probably be renamed to vfs_permission, b/c
it is a generic VFS routine (and we make direct use of it in lofs).

BTW, with stacking, "permission" gets called O(n^2) times in total.  I'm not
sure there's anything that can be done about it now, but it's something to
keep in mind.  Here's the call sequence recursive scenario when we have lofs
mounted on, say, ext2 (just one stack level):

vfs_create
may_create
permission
lofs_permission
permission
ext2_permission
lofs_create
vfs_create
may_create
permission
ext2_permission

This happens b/c we use the nicer/newer vfs_ routines.  However, since
permission() is also called from places other than vfs_ routines, we
must define permission in lofs, and thus it gets called recursively.

We thought we could solve the problem by not defining our own permission
method, b/c the real routines (mkdir, create, etc) will call permission on
the lower f/s via the vfs_ routines, but we couldn't do it b/c
permission() is called explicitly in open_namei().

One possible solution is creating a vfs_open() routine which will do most of
the checks in filp_open, including permission(), but will take a dentry and
not a filename.  Then filp_open can call vfs_open, and so could we; right
now we have to duplicate most of the filp_open code in our ->open function.
This would also nicely solve the recursive permission problem, as well a
cleanup filp_open().

(5) llseek:

In fs/read_write.c, llseek should probably be renamed vfs_llseek, and the
un/lock_kernel that it calls should be moved to sys_llseek.  Then vfs_llseek
should be exported so we can use it.

(6) vfs_readdir:

vfs_readdir doesn't take the same prototype list as _readdir, which
can be *very* confusing since all other vfs_ use the same prototype.  I
suggest you make the two the same: swap the "dirent" and "filldir" args in
vfs_readdir() so they're the same everywhere.  We've had some amusing (read:
nasty :) kernel panics b/c of that.

Cheers,
Erez & Ion.



Re: [Request]JFS paper

2000-03-19 Thread sbest



kyung park wrote:
>
> Hello,
>
> My name is Kyung Park, the graduate student who is looking for the paper
> about JFS in Linux. I am preparing the term paper about journaled file
> system in Linux, so I need the related papers as many as possible. It
> will be better if the paper was published in conferences, journals, or
> workshops.
>
> If you have the paper or know where the paper is, please let me know.
> Your help will be highly appreicated. If you want to know anything to
> me, don't hesitate to reply me.
>
> Regards,
> Kyung

JFS overview white paper is available at
http://oss.software.ibm.com/developer/opensource/jfs/

Steve




Re: [Request]JFS paper

2000-03-19 Thread Hans Reiser

kyung park wrote:
> 
> Hello,
> 
> My name is Kyung Park, the graduate student who is looking for the paper
> about JFS in Linux. I am preparing the term paper about journaled file
> system in Linux, so I need the related papers as many as possible. It
> will be better if the paper was published in conferences, journals, or
> workshops.
> 
> If you have the paper or know where the paper is, please let me know.
> Your help will be highly appreicated. If you want to know anything to
> me, don't hesitate to reply me.
> 
> Regards,
> Kyung

ReiserFS is described in the URL in my signature.


Hans
-- 
You can get ReiserFS at http://devlinux.org/namesys, and customizations and
industrial grade support at [EMAIL PROTECTED]



[Request]JFS paper

2000-03-19 Thread kyung park

Hello,

My name is Kyung Park, the graduate student who is looking for the paper
about JFS in Linux. I am preparing the term paper about journaled file
system in Linux, so I need the related papers as many as possible. It
will be better if the paper was published in conferences, journals, or
workshops.

If you have the paper or know where the paper is, please let me know.
Your help will be highly appreicated. If you want to know anything to
me, don't hesitate to reply me.

Regards,
Kyung




Re: optimal number of files in directory on ext2

2000-03-19 Thread Hans Reiser

[EMAIL PROTECTED] wrote:

> 
> Finally, note that by using multiple directories, what you are
> effectively doing is a Radix search, which for certain datasets (and
> from what I can tell, your file name space may very well qualify), a
> carefully designed Radix Search tree can be more efficient than a
> B-tree.

I don't understand this point.  Can you elaborate?

Hans

-- 
You can get ReiserFS at http://devlinux.org/namesys, and customizations and
industrial grade support at [EMAIL PROTECTED]



Re: /proc/mounts, /dev/root

2000-03-19 Thread Andrew Clausen

"Patrick J. LoPresti" wrote:
> 
> Try "cat /proc/sys/kernel/real-root-dev".  Convert to hex and maybe
> you will see something familiar.  Or maybe not :-).

This is exactly what I need!  Thanks!

Andrew Clausen



Re: optimal number of files in directory on ext2

2000-03-19 Thread tytso

   Date:   Wed, 15 Mar 2000 23:08:38 +0100 (CET)
   From: Michal Pise <[EMAIL PROTECTED]>

   I was asked to create the application which will handle about 100 000 000
   relatively small (about 1-2k) files. I decided to make a
   multi-level directory tree, but I don't know which is the optimal number
   of files in one directory. Is it 10, 100, 1000 or even more? 

   Someone told me that directory should be so small so all the entries fits
   in directly adressed blocks - on 4kB block its 12*4kB = 48kB of space for
   filenames, for short filenames its at least 1000 files in one directory.

   Another guy told me thah directory should be so small so all the entries
   fits in the first block. Its something about 200 short filenames in one
   directory.

   But if I create 14 directories each with 14 files, the average lookup time
   will be 2 * 7 * time_to_explore_directory_entry + time_to_descend_into_dir
   instead of 100 * time_to_explore_directory_entry. And I think that
   time_to_descend_into_dir < 80 * time_to_explore_directory_entry. So its
   better to create tree with many levels and only a few files in each
   directory.

   Am I right? Did I miss something?

The important point to consider is that actually searching a directory
block is cheap (mere CPU time).  What is expensive is disk I/O.  This
comes from (a) reading the directory block, and (b) reading the inode
block to find out where the directory entries are.

Given that you will need to read the directory blocks to get at the data
regardless, the only cost to consider between the two approaches is
reading the inode data.  By using a flatter directory hierarchy, Linux
will need to read in fewer inodes, and those inodes, once read into
memory, will more likely be cached.

One futher trick you can play is to pre-allocate space in the
directories so that their blocks are contiguously allocated.  See the
sources to mklost+found (included below, since it's so small),  for an
example of how to do this.   This will further speed up the search, and
make the fewer directory levels approach even more attractive, since you
avoid having to force the disk to seek all over the place.

Finally, note that by using multiple directories, what you are
effectively doing is a Radix search, which for certain datasets (and
from what I can tell, your file name space may very well qualify), a
carefully designed Radix Search tree can be more efficient than a
B-tree.  So even if in the future, if you decide to move to a different
filesystem that has B-tree directories, such as xfs, reseirfs, or a
future version of ext3, you may still want to consider keeping this
structure in your program.

- Ted

/*
 * mklost+found.c   - Creates a directory lost+found on a mounted second
 *extended file system
 *
 * Copyright (C) 1992, 1993  Remy Card <[EMAIL PROTECTED]>
 *
 * This file can be redistributed under the terms of the GNU General
 * Public License
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 

#define LPF "lost+found"

int main (int argc, char ** argv)
{
char name [EXT2_NAME_LEN];
char path [sizeof (LPF) + 1 + 256];
struct stat st;
int i, j;
int d;

if (mkdir (LPF, 0755) == -1) {
perror ("mkdir");
exit(1);
}

i = 0;
memset (name, 'x', 252);
do {
sprintf (name + 252, "%02d", i);
strcpy (path, LPF);
strcat (path, "/");
strcat (path, name);
if ((d = creat (path, 0644)) == -1) {
perror ("creat");
exit (1);
}
i++;
close (d);
if (stat (LPF, &st) == -1) {
perror ("stat");
exit (1);
}
} while (st.st_size <= (EXT2_NDIR_BLOCKS - 1) * st.st_blksize);
for (j = 0; j < i; j++) {
sprintf (name + 252, "%02d", j);
strcpy (path, LPF);
strcat (path, "/");
strcat (path, name);
if (unlink (path) == -1) {
perror ("unlink");
exit (1);
}
}
exit (0);
}