Re: [RFC] set MS_NOATIME on FAT ?

2005-03-14 Thread OGAWA Hirofumi
Werner Almesberger <[EMAIL PROTECTED]> writes:

> Ah, I see. But, at the moment, VFAT doesn't set atime from adate,
> and vice versa, or have I overlooked something ?

Right. However, if you need NOATIME, you can set it with mount
options.  And I think, we just need to fix ->adate, no need to change
default options.

Thanks.
-- 
OGAWA Hirofumi <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] set MS_NOATIME on FAT ?

2005-03-14 Thread Werner Almesberger
OGAWA Hirofumi wrote:
> No. The fatfs has the ->adate, so I think we should update it rather.

Ah, I see. But, at the moment, VFAT doesn't set atime from adate,
and vice versa, or have I overlooked something ?

Thanks,
- Werner

-- 
  _
 / Werner Almesberger, Buenos Aires, Argentina [EMAIL PROTECTED] /
/_http://www.almesberger.net//
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] set MS_NOATIME on FAT ?

2005-03-14 Thread OGAWA Hirofumi
Werner Almesberger <[EMAIL PROTECTED]> writes:

> as far as I can tell, none of FAT or its offsprings use atime, so
> perhaps fs/fat/inode.c should just set MS_NOATIME, so that we don't
> get unnecessary "inode" writes ?

No. The fatfs has the ->adate, so I think we should update it rather.

Thanks.
-- 
OGAWA Hirofumi <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] set MS_NOATIME on FAT ?

2005-03-14 Thread Werner Almesberger
Hi,

as far as I can tell, none of FAT or its offsprings use atime, so
perhaps fs/fat/inode.c should just set MS_NOATIME, so that we don't
get unnecessary "inode" writes ? (They hurt if you want to reduce
worst-case latency in the write path.)

Here's a patch for 2.6.11 (with some offset, because I pulled it
from a larger patch).

Does this look good ?

Thanks,
- Werner

-- cut here ---

Signed-off-by: Werner Almesberger <[EMAIL PROTECTED]>

--- linux-2.6.11-orig/fs/fat/inode.cWed Mar  2 04:38:08 2005
+++ linux-2.6.11/fs/fat/inode.c Thu Mar  3 01:35:57 2005
@@ -413,7 +483,7 @@ static void __exit fat_destroy_inodecach
 
 static int fat_remount(struct super_block *sb, int *flags, char *data)
 {
-   *flags |= MS_NODIRATIME;
+   *flags |= MS_NODIRATIME | MS_NOATIME;
return 0;
 }
 
@@ -1058,7 +1128,7 @@ int fat_fill_super(struct super_block *s
sb->s_fs_info = sbi;
memset(sbi, 0, sizeof(struct msdos_sb_info));
 
-   sb->s_flags |= MS_NODIRATIME;
+   sb->s_flags |= MS_NODIRATIME | MS_NOATIME;
sb->s_magic = MSDOS_SUPER_MAGIC;
sb->s_op = &fat_sops;
sb->s_export_op = &fat_export_ops;

-- 
  _
 / Werner Almesberger, Buenos Aires, Argentina [EMAIL PROTECTED] /
/_http://www.almesberger.net//
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Ext2-devel] Reviewing ext3 improvement patches (delalloc, mballoc, extents)

2005-03-14 Thread Werner Almesberger
Bryan Henderson wrote:
> I think "reservation" is wrong for one of them and anyone using it that 
> way should stop.

Hehe, start with ext3 :-)

> I believe the common terminology is:

Sounds reasonable. The thing with "reservation" is that people use
it in daily life with all kinds of meanings, and often with the
object of the reservation, e.g. "reserve a seat" (typically a
specific seat), "reserve some time" (often not a specific interval),
or "reserve a table" (at a restaurant, you don't know which one,
but the restaurant staff does).

To muddy the issue further, reservations can be more or less firm.
E.g. if we "reserve" the next hundred blocks, so that allocation is
contiguous, we may want to be able to take them away if some other
file needs them. On the other hand, if storage is already committed,
but just not on disk yet, that reservation shouldn't be revokable.

- Werner

-- 
  _
 / Werner Almesberger, Buenos Aires, Argentina [EMAIL PROTECTED] /
/_http://www.almesberger.net//
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Ext2-devel] Reviewing ext3 improvement patches (delalloc, mballoc, extents)

2005-03-14 Thread Werner Almesberger
Alex Tomas wrote:
> you can drop PG_locked right as you set PG_writeback, I think

Hmm, not sure. mpage_writepage never calls writepage with PG_writeback,
only with PG_locked. Also, mpage_writepage calls get_block with
PG_locked, so the allocation, which may take a while, holds the lock.

This situation is admittedly a bit annoying: on the one hand, "sync"
should write all dirty data. On the other hand, if a random user
typing "sync" can break performance guarantees, these guarantees
aren't very valuable.

- Werner

-- 
  _
 / Werner Almesberger, Buenos Aires, Argentina [EMAIL PROTECTED] /
/_http://www.almesberger.net//
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Ext2-devel] Reviewing ext3 improvement patches (delalloc, mballoc, extents)

2005-03-14 Thread Bryan Henderson
>Hmm, it's a bit confusing that we call both things "reservation".

I think "reservation" is wrong for one of them and anyone using it that 
way should stop.  I believe the common terminology is:

- choosing the blocks is "placement."

- committing the required number of blocks from the resource pool for the 
instant use is "reservation."

- the combination of reservation and placement is "allocation."

Obviously, traditional filesystem drivers haven't split placement from 
reservation, so don't bother to use those terms.

Most delaying schemes delay the placement but not the reservation because 
they don't want to accept the possibility that a write would fail for lack 
of space after the write() system call succeeded.

Even in non-filesystem areas, "allocate" usually means to assign 
particular resources, while "reserve" just means to make arrangements so 
that a future allocate will succeed.  For example, if you know you need up 
to 10 blocks of memory to complete a task without deadlocking, but you 
don't know yet how exactly how many, you would reserve 10 blocks and 
later, if necessary, allocate the actual blocks.

--
Bryan Henderson  IBM Almaden Research Center
San Jose CA  Filesystems

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RFC: exporting per-superblock statistics to user space

2005-03-14 Thread Chuck Lever
we still have a need to provide "iostat" like statistics for NFS 
clients.  attached are a couple of patches, against 2.6.11.3, which 
prototype an approach for providing this kind of data to user programs. 
 i'd like some comment on the approach.

01-mountstats.patch adds a new file called /proc/self/mountstats and a 
new file system hook called show_stats.  this just replicates 
/proc/mounts and the show_options hook.

02-nfs-iostat.patch teachs the NFS client to use the new show_stats hook 
as a demonstration.

note that this approach addresses previously voiced concerns about 
exporting per-superblock stats to user space.

1.  processes can't see stats for file systems mounted outside their 
namespace.

2.  reading the stats file is serialized with mount and unmount operations.
3.  the approach doesn't use /sys or kobjects.
4.  there are no lifetime issues tied to file systems loaded as a module.
 [PATCH] VFS: New /proc file /proc/self/mountstats
 
 Create a new file under /proc/self, called mountstats, where mounted file
 systems can export information (configuration options, performance counters,
 and so on).  Use a mechanism similar to /proc/mounts and s_ops->show_options.

 This mechanism does not violate namespace security, and is safe to use while
 other processes are unmounting file systems.

 Test-plan:
 Test concurrent mount/unmount operations while cat'ing /proc/self/mountstats.

 Version: Mon, 14 Mar 2005 17:06:04 -0500
 
 Signed-off-by: Chuck Lever <[EMAIL PROTECTED]>
---
 
 fs/namespace.c |   66 +
 fs/proc/base.c |   40 +++
 include/linux/fs.h |1 
 3 files changed, 107 insertions(+)
 
 
diff -X /home/cel/src/linux/dont-diff -Naurp 00-stock/fs/namespace.c 01-mountstats/fs/namespace.c
--- 00-stock/fs/namespace.c	2005-03-02 02:38:13.0 -0500
+++ 01-mountstats/fs/namespace.c	2005-03-14 15:24:51.565085000 -0500
@@ -265,6 +265,72 @@ struct seq_operations mounts_op = {
 	.show	= show_vfsmnt
 };
 
+/* iterator */
+static void *ms_start(struct seq_file *m, loff_t *pos)
+{
+	struct namespace *n = m->private;
+	struct list_head *p;
+	loff_t l = *pos;
+
+	down_read(&n->sem);
+	list_for_each(p, &n->list)
+		if (!l--)
+			return list_entry(p, struct vfsmount, mnt_list);
+	return NULL;
+}
+
+static void *ms_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct namespace *n = m->private;
+	struct list_head *p = ((struct vfsmount *)v)->mnt_list.next;
+	(*pos)++;
+	return p==&n->list ? NULL : list_entry(p, struct vfsmount, mnt_list);
+}
+
+static void ms_stop(struct seq_file *m, void *v)
+{
+	struct namespace *n = m->private;
+	up_read(&n->sem);
+}
+
+static int show_vfsstat(struct seq_file *m, void *v)
+{
+	struct vfsmount *mnt = v;
+	int err = 0;
+
+	/* device */
+	if (mnt->mnt_devname) {
+		seq_puts(m, "device ");
+		mangle(m, mnt->mnt_devname);
+	} else
+		seq_puts(m, "no device");
+
+	/* mount point */
+	seq_puts(m, " mounted on ");
+	seq_path(m, mnt, mnt->mnt_root, " \t\n\\");
+	seq_putc(m, ' ');
+
+	/* file system type */
+	seq_puts(m, "with fstype ");
+	mangle(m, mnt->mnt_sb->s_type->name);
+
+	/* optional statistics */
+	if (mnt->mnt_sb->s_op->show_stats) {
+		seq_putc(m, ' ');
+		err = mnt->mnt_sb->s_op->show_stats(m, mnt);
+	}
+
+	seq_putc(m, '\n');
+	return err;
+}
+
+struct seq_operations mountstats_op = {
+	.start	= ms_start,
+	.next	= ms_next,
+	.stop	= ms_stop,
+	.show	= show_vfsstat,
+};
+
 /**
  * may_umount_tree - check if a mount tree is busy
  * @mnt: root of mount tree
diff -X /home/cel/src/linux/dont-diff -Naurp 00-stock/fs/proc/base.c 01-mountstats/fs/proc/base.c
--- 00-stock/fs/proc/base.c	2005-03-02 02:38:12.0 -0500
+++ 01-mountstats/fs/proc/base.c	2005-03-14 15:24:51.571085000 -0500
@@ -60,6 +60,7 @@ enum pid_directory_inos {
 	PROC_TGID_STATM,
 	PROC_TGID_MAPS,
 	PROC_TGID_MOUNTS,
+	PROC_TGID_MOUNTSTATS,
 	PROC_TGID_WCHAN,
 #ifdef CONFIG_SCHEDSTATS
 	PROC_TGID_SCHEDSTAT,
@@ -91,6 +92,7 @@ enum pid_directory_inos {
 	PROC_TID_STATM,
 	PROC_TID_MAPS,
 	PROC_TID_MOUNTS,
+	PROC_TID_MOUNTSTATS,
 	PROC_TID_WCHAN,
 #ifdef CONFIG_SCHEDSTATS
 	PROC_TID_SCHEDSTAT,
@@ -134,6 +136,7 @@ static struct pid_entry tgid_base_stuff[
 	E(PROC_TGID_ROOT,  "root",S_IFLNK|S_IRWXUGO),
 	E(PROC_TGID_EXE,   "exe", S_IFLNK|S_IRWXUGO),
 	E(PROC_TGID_MOUNTS,"mounts",  S_IFREG|S_IRUGO),
+	E(PROC_TGID_MOUNTSTATS, "mountstats", S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
 	E(PROC_TGID_ATTR,  "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -164,6 +167,7 @@ static struct pid_entry tid_base_stuff[]
 	E(PROC_TID_ROOT,   "root",S_IFLNK|S_IRWXUGO),
 	E(PROC_TID_EXE,"exe", S_IFLNK|S_IRWXUGO),
 	E(PROC_TID_MOUNTS, "mounts",  S_IFREG|S_IRUGO),
+	E(PROC_TID_MOUNTSTATS, "mountstats", S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
 	E(PROC_TID_ATTR,   "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -528,6 +532,38 @@ static struct file_operations proc_mount
 	.release	= mounts_rele

Re: [Ext2-devel] Reviewing ext3 improvement patches (delalloc, mballoc, extents)

2005-03-14 Thread Alex Tomas
> Werner Almesberger (WA) writes:

 >> locked during writeback? PG_writeback should be used instead of PG_locked.

 WA> In mpage_writepages, writepage can also get called with the page just
 WA> PG_locked.

you can drop PG_locked right as you set PG_writeback, I think

thanks, Alex

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Ext2-devel] Reviewing ext3 improvement patches (delalloc, mballoc, extents)

2005-03-14 Thread Werner Almesberger
Alex Tomas wrote:
> I see no reason to reserve specific block in ->prepare/->commit in
> delayed allocation case. We already do this with reservation.

This seems like a sensible approach to me. Trying to reserve specific
blocks in an FS-independent way was what got us in trouble on ABISS.
So the plan B is to add this kind of reservation to where it is really
lacking (i.e. FAT).

Hmm, it's a bit confusing that we call both things "reservation".
Well, airlines do this too, "free seating".

> locked during writeback? PG_writeback should be used instead of PG_locked.

In mpage_writepages, writepage can also get called with the page just
PG_locked.

- Werner

-- 
  _
 / Werner Almesberger, Buenos Aires, Argentina [EMAIL PROTECTED] /
/_http://www.almesberger.net//
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Ext2-devel] Reviewing ext3 improvement patches (delalloc, mballoc, extents)

2005-03-14 Thread Alex Tomas
> Werner Almesberger (WA) writes:

 WA> Do you plan to reserve space as "blocks, somewhere", or as "these
 WA> specific on-disk locations" ? In ABISS, we did something of the
 WA> latter kind (in order to make large contiguous allocations also on
 WA> FAT), and it turned out to be a big mess, because ABISS needed too
 WA> much support from the file system driver. So we just scrapped that
 WA> bit :-)

I see no reason to reserve specific block in ->prepare/->commit in
delayed allocation case. We already do this with reservation.
The sole point of delayed allocation is to allocate many blocks at once:
to minimize fragmentation, to decrease allocator involvement, to avoid
allocation at all if the file gets truncated quickly.

 WA> The main parts: we added a new page flag, PG_delalloc, which
 WA> basically tells everyone to stay away from that page. There are
 WA> two purposes: (a) to make sure no allocation happens unless
 WA> explicitly requested, and (b) prevent the page from being written
 WA> back while it is still in ABISS' playout buffer. The reason for
 WA> (b) is that the page gets locked during writeback, which could
 WA> cause delays if the ABISS-using application then decides to
 WA> access the page.

locked during writeback? PG_writeback should be used instead of PG_locked.


thanks, Alex

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Ext2-devel] Reviewing ext3 improvement patches (delalloc, mballoc, extents)

2005-03-14 Thread Werner Almesberger
Suparna Bhattacharya wrote:
> I'm looking at whether we can do most of it at VFS level

Do you plan to reserve space as "blocks, somewhere", or as "these
specific on-disk locations" ? In ABISS, we did something of the
latter kind (in order to make large contiguous allocations also on
FAT), and it turned out to be a big mess, because ABISS needed too
much support from the file system driver. So we just scrapped that
bit :-)

> Of course, I haven't looked at how ABISS does delayed alloc -- 
> do you have a patch snippet I can look at ?

I just made a release. The kernel patch is in
abiss-7/kernel/abiss.patch  It's all in one big patch, sorry.
The main purpose of this is to see what we can achieve, so it's
not very polished.

The main parts: we added a new page flag, PG_delalloc, which
basically tells everyone to stay away from that page. There are
two purposes: (a) to make sure no allocation happens unless
explicitly requested, and (b) prevent the page from being written
back while it is still in ABISS' playout buffer. The reason for
(b) is that the page gets locked during writeback, which could
cause delays if the ABISS-using application then decides to
access the page.

The "hands off" code is mainly in fs/buffer.c, in the functions
__block_commit_write (set the page dirty, then go away),
cont_prepare_write (for FAT, do nothing),
block_prepare_write  (for ext2, do nothing),
and then fs/mpage.c:mpage_writepages (skip pages marked for
delayed allocation).

cont_prepare_write also needs to handle the special case where
it has to fill holes in a file. In this case, it simply overrides
delayed allocation. This bit will need more work.

Since ABISS prefetches pages, cont_prepare_write and
cont_prepare_write may now see pages that are already up to date,
so they must not zero them.

The prefetching happens in fs/abiss/sched_lib.c:abiss_read_page,
and writeback in abiss_put_page. We also experimented with
leaving the writeback to MM, but that led to OOM far too often.
The current solution works quite smoothly even if we tax the
system hard.

In order to keep things simple, I didn't try to make delayed
allocation do anything for writers that don't use ABISS.

The life cycle of a page is about as follows: when an application
reads or writes a file, ABISS maintains a playout buffer for it,
that typically reaches a few hundred kB ahead of the current file
position. Pages are prefetched and locked in the playout buffer.
The playout buffer is dimensioned that when file data enters the
playout buffer, there is enough time for the data to be in memory
by the time the application reaches it.

ABISS just calls readpage to get the data, which either causes it
to be read from disk, or the page to be zeroed, if we're beyond
EOF or at a hole.

The application accesses the page through the normal VFS functions,
so in the case of writing, the prepare/commit process happens.

Once the application has accessed the page, and moves the playout
buffer beyond it, the page is released and written back to disk.
Prefetching and writeback is done in a separate kernel thread, so
the application does not get delayed.

- Werner

-- 
  _
 / Werner Almesberger, Buenos Aires, Argentina [EMAIL PROTECTED] /
/_http://www.almesberger.net//
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Active Block I/O Scheduling System (ABISS), version 7

2005-03-14 Thread Werner Almesberger
The Active Block I/O Scheduling System (ABISS) is an extension of the
hard-disk storage subsystem of Linux, whose main purpose is to provide
a guaranteed reading and (eventually) writing bit rate to applications.

ABISS is conducted by Philips Research in Eindhoven, the Netherlands
(see http://www.research.philips.com/technologies/storage/index.html).

http://abiss.sourceforge.net/abiss-7.tar.gz
md5sum 081abbfa1d11ce268dab300576edc194
sha1sum 7851ebd768fc1a96207836b5189450c90e4ddd05

This release upgrades ABISS to the 2.6.11 kernel, brings some major
cleanup and introduces experimental support for writing with a
guaranteed rate. The highlights:

 - the "allocator" functionality has been completely removed. It
   represented a very complicated way for doing things that can be
   done much more efficiently and cleanlier in the file system driver,
   complicated the inner workings of ABISS, and wasn't of much use in
   its present state anyway.

 - removed the abiss_detach message, which was a no-op

 - this release adds an experimental mechanism for delayed allocation
   of file blocks. In its current form, this is mainly intended for
   exploring performance aspects, and may have yet undiscovered
   fascinating bugs. This may also be of interest for a broader
   audience, hence the cross-posting to linux-fsdevel.

 - ABISS now tries to guarantee the accepted data rate also when
   writing. For now, this only works for FAT and ext2, and when delayed
   allocations are enabled. All this is still very experimental and
   only works most of the time.

For additional information, please have a look at
http://abiss.sourceforge.net/

- Werner

-- 
  _
 / Werner Almesberger, Buenos Aires, Argentina [EMAIL PROTECTED] /
/_http://www.almesberger.net//
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Ext2-devel] Reviewing ext3 improvement patches (delalloc, mballoc, extents)

2005-03-14 Thread Suparna Bhattacharya
On Mon, Mar 14, 2005 at 05:36:58AM -0300, Werner Almesberger wrote:
> Mingming Cao wrote:
> > I agree delayed allocation make much sense with multiblock allocation.
> > But I still think itself worth the effort, even without multiple block
> > allocation.
> 
> On ABISS, we're currently also experimenting with delayed allocation.
> There, the goal is less to improve overall performance, but to move
> the accesses out of the synchronous code path for write(2).
> 
> The code works quite nicely for FAT and ext2, limiting the time it
> takes to make a write call writing new data to about 4-6 ms on a
> fairly sluggish machine (plus about 2-4 ms for moving the playout
> point, which is a separate operation in ABISS), and with eight
> competing best-effort writers who each enjoy write latencies of some
> 8 seconds, worst-case, overwriting old data.
> 
> Of course, this fails horribly on ext3, because it doesn't do anything
> useful with the journal. Another problem is error handling. Since FAT
> and ext2 don't have any form of reservation, a full disk isn't detected
> until it's far too late.
> 
> So, a VFS-level reservation function would indeed be nice to have.
> 
> I looked at ext3 delalloc briefly, and while it did indeed improve
> performance quite nicely, by being tied to ext3 internals, it would
> be difficult to use in the framework of ABISS, where the code paths
> are different (e.g. the prepare/commit functions should be as close
> to no-ops as possible, and leave all the work to the prefetcher
> thread), and which tries to be relatively file system independent.

I'm looking at whether we can do most of it at VFS level ... with
ext3 only taking care of the additional journalling bit - seems
quite feasible. There are two reqs (1) reservation (2) changing
mpage_writepages to use get_blocks(), which don't seem too hard.
ext3 ordered mode will need a bit more thought.

Of course, I haven't looked at how ABISS does delayed alloc -- 
do you have a patch snippet I can look at ?

Regards
Suparna

> 
> - Werner
> 
> -- 
>   _
>  / Werner Almesberger, Buenos Aires, Argentina [EMAIL PROTECTED] /
> /_http://www.almesberger.net//
> 
> 
> ---
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> ___
> Ext2-devel mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/ext2-devel

-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Ext2-devel] Reviewing ext3 improvement patches (delalloc, mballoc, extents)

2005-03-14 Thread Werner Almesberger
Mingming Cao wrote:
> I agree delayed allocation make much sense with multiblock allocation.
> But I still think itself worth the effort, even without multiple block
> allocation.

On ABISS, we're currently also experimenting with delayed allocation.
There, the goal is less to improve overall performance, but to move
the accesses out of the synchronous code path for write(2).

The code works quite nicely for FAT and ext2, limiting the time it
takes to make a write call writing new data to about 4-6 ms on a
fairly sluggish machine (plus about 2-4 ms for moving the playout
point, which is a separate operation in ABISS), and with eight
competing best-effort writers who each enjoy write latencies of some
8 seconds, worst-case, overwriting old data.

Of course, this fails horribly on ext3, because it doesn't do anything
useful with the journal. Another problem is error handling. Since FAT
and ext2 don't have any form of reservation, a full disk isn't detected
until it's far too late.

So, a VFS-level reservation function would indeed be nice to have.

I looked at ext3 delalloc briefly, and while it did indeed improve
performance quite nicely, by being tied to ext3 internals, it would
be difficult to use in the framework of ABISS, where the code paths
are different (e.g. the prepare/commit functions should be as close
to no-ops as possible, and leave all the work to the prefetcher
thread), and which tries to be relatively file system independent.

- Werner

-- 
  _
 / Werner Almesberger, Buenos Aires, Argentina [EMAIL PROTECTED] /
/_http://www.almesberger.net//
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html