Re: [PATCH: 2.6.20-rc4-mm1] JFS: Avoid deadlock introduced by explicit I/O plugging

2007-01-17 Thread Josef Sipek
On Wed, Jan 17, 2007 at 04:55:49PM -0600, Dave Kleikamp wrote:
...
> diff -Nurp linux-2.6.20-rc4-mm1/fs/jfs/jfs_lock.h linux/fs/jfs/jfs_lock.h
> --- linux-2.6.20-rc4-mm1/fs/jfs/jfs_lock.h2006-11-29 15:57:37.0 
> -0600
> +++ linux/fs/jfs/jfs_lock.h   2007-01-17 15:30:19.0 -0600
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   *   jfs_lock.h
> @@ -42,6 +43,7 @@ do {
> \
>   if (cond)   \
>   break;  \
>   unlock_cmd; \
> + blk_replug_current_nested();\
>   schedule(); \
>   lock_cmd;   \
>   }   \

Is {,un}lock_cmd a macro? ...

Jeff.

-- 
Keyboard not found!
Press F1 to enter Setup
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH: 2.6.20-rc4-mm1] JFS: Avoid deadlock introduced by explicit I/O plugging

2007-01-17 Thread Dave Kleikamp
On Thu, 2007-01-18 at 10:46 +1100, Jens Axboe wrote:
> On Wed, Jan 17 2007, Dave Kleikamp wrote:
> > On Thu, 2007-01-18 at 10:18 +1100, Jens Axboe wrote:
> >
> > > Can you try io_schedule() and verify that things just work?
> >
> > I actually did do that in the first place, but wondered if it was the
> > right thing to introduce the accounting changes that came with that.
> > I'll change it back to io_schedule() and test it again, just to make
> > sure.
> 
> It appears to be the correct change to me - you really are waiting for
> IO resources (otherwise it would not hang with the plug change), so
> doing an inc/dec of iowait around the schedule should be done.

Okay, here it is.

> > If that's the right fix, I can push it directly since it won't have any
> > dependencies on your patches.
> 
> Perfect!

It should make the next -mm.

JFS: call io_schedule() instead of schedule() to avoid deadlock

The introduction of Jens Axboe's explicit i/o plugging patches introduced a
deadlock in jfs.  This was caused by the process initiating I/O not
unplugging the queue before waiting on the commit thread.  The commit
thread itself was waiting for that I/O to complete.  Calling io_schedule()
rather than schedule() unplugs the I/O queue avoiding the deadlock, and it
appears to be the right function to call in any case.

Signed-off-by: Dave Kleikamp <[EMAIL PROTECTED]>

---
commit 4aa0d230c2cfc1ac4bcf7c5466f9943cf14233a9
tree b873dce6146f4880c6c48ab53c0079566f52a60b
parent 82d5b9a7c63054a9a2cd838ffd177697f86e7e34
author Dave Kleikamp <[EMAIL PROTECTED]> Wed, 17 Jan 2007 21:18:35 -0600
committer Dave Kleikamp <[EMAIL PROTECTED]> Wed, 17 Jan 2007 21:18:35 -0600

 fs/jfs/jfs_lock.h |2 +-
 fs/jfs/jfs_metapage.c |2 +-
 fs/jfs/jfs_txnmgr.c   |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/jfs/jfs_lock.h b/fs/jfs/jfs_lock.h
index 7d78e83..df48ece 100644
--- a/fs/jfs/jfs_lock.h
+++ b/fs/jfs/jfs_lock.h
@@ -42,7 +42,7 @@ do {  \
if (cond)   \
break;  \
unlock_cmd; \
-   schedule(); \
+   io_schedule();  \
lock_cmd;   \
}   \
current->state = TASK_RUNNING;  \
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index ceaf03b..58deae0 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -56,7 +56,7 @@ static inline void __lock_metapage(struct metapage *mp)
set_current_state(TASK_UNINTERRUPTIBLE);
if (metapage_locked(mp)) {
unlock_page(mp->page);
-   schedule();
+   io_schedule();
lock_page(mp->page);
}
} while (trylock_metapage(mp));
diff --git a/fs/jfs/jfs_txnmgr.c b/fs/jfs/jfs_txnmgr.c
index d558e51..6988a10 100644
--- a/fs/jfs/jfs_txnmgr.c
+++ b/fs/jfs/jfs_txnmgr.c
@@ -135,7 +135,7 @@ static inline void TXN_SLEEP_DROP_LOCK(wait_queue_head_t * 
event)
add_wait_queue(event, &wait);
set_current_state(TASK_UNINTERRUPTIBLE);
TXN_UNLOCK();
-   schedule();
+   io_schedule();
current->state = TASK_RUNNING;
remove_wait_queue(event, &wait);
 }

-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH: 2.6.20-rc4-mm1] JFS: Avoid deadlock introduced by explicit I/O plugging

2007-01-17 Thread Jens Axboe
On Wed, Jan 17 2007, Dave Kleikamp wrote:
> Jens,
> Can you please take a look at this patch, and if you think it's sane,
> add it to your explicit i/o plugging patchset?  Would it make sense in
> any of these paths to use io_schedule() instead of schedule()?

I'm glad you bring that up, actually. One of the "downsides" of the new
unplugging is that it really requires anyone waiting for IO in a path
like the file system or device driver to use io_schedule() instead of
schedule() to get the blk_replug_current_nested() done to avoid
deadlocks. While it is annoying that it could introduce some deadlocks
until we get things fixed it, I do consider it a correctness fix even in
the generic kernel, as you are really waiting for IO and as such should
use io_schedule() in the first place.

Perhaps I should add a WARN_ON() check for this to catch these bugs
upfront.

> I hadn't looked at your patchset until I discovered that jfs was easy to
> hang in the -mm kernel.  I think jfs may be able to add explicit
> plugging and unplugging in a couple of places, but I'd like to fix the
> hang right away and take my time with any later patches.

Can you try io_schedule() and verify that things just work?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH: 2.6.20-rc4-mm1] JFS: Avoid deadlock introduced by explicit I/O plugging

2007-01-17 Thread Jens Axboe
On Wed, Jan 17 2007, Dave Kleikamp wrote:
> On Thu, 2007-01-18 at 10:18 +1100, Jens Axboe wrote:
> 
> > Can you try io_schedule() and verify that things just work?
> 
> I actually did do that in the first place, but wondered if it was the
> right thing to introduce the accounting changes that came with that.
> I'll change it back to io_schedule() and test it again, just to make
> sure.

It appears to be the correct change to me - you really are waiting for
IO resources (otherwise it would not hang with the plug change), so
doing an inc/dec of iowait around the schedule should be done.

> If that's the right fix, I can push it directly since it won't have any
> dependencies on your patches.

Perfect!

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH: 2.6.20-rc4-mm1] JFS: Avoid deadlock introduced by explicit I/O plugging

2007-01-17 Thread Dave Kleikamp
On Thu, 2007-01-18 at 10:18 +1100, Jens Axboe wrote:

> Can you try io_schedule() and verify that things just work?

I actually did do that in the first place, but wondered if it was the
right thing to introduce the accounting changes that came with that.
I'll change it back to io_schedule() and test it again, just to make
sure.

If that's the right fix, I can push it directly since it won't have any
dependencies on your patches.

Thanks,
Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -mm 0/10][RFC] aio: make struct kiocb private

2007-01-17 Thread Nate Diller

On Wed, 17 Jan 2007, Benjamin LaHaise wrote:


On Mon, Jan 15, 2007 at 08:25:15PM -0800, Nate Diller wrote:

the right thing to do from a design perspective.  Hopefully it enables
a new architecture that can reduce context switches in I/O completion,
and reduce overhead.  That's the real motive ;)


And it's a broken motive.  Context switches per se are not bad, as they
make it possible to properly schedule code in a busy system (which is
*very* important when realtime concerns come into play).  Have a look
at how things were done in the 2.4 aio code to see how completion would
get done with a non-retry method, typically in interrupt context.  I had
code that did direct I/O rather differently by sharing code with the
read/write code paths at some point, the catch being that it was pretty
invasive, which meant that it never got merged with the changes to handle
writeback pressure and other work that happened during 2.5.


I'm having some trouble understanding your concern.  From my perspective,
any unnecessary context switch represents not only performance loss, but
extra complexity in the code.  In this case, I'm not suggesting that the
aio.c code causes problems, quite the opposite.  The code I'd like to change
is FS and md levels, where context switches happen because of timers,
workqueues, and worker threads.  For sync I/O, these layers could be doing
their completion work in process context, but because waiting on sync I/O is
done in layers above, they must resort to other means, even for the common
case.  The dm-crypt module is the most straightforward example.

I took a look at some 2.4.18 aio patches in kernel.org/.../bcrl/aio/, and if
I understand what you did, you were basically operating at the aops level
rather than f_ops.  I actually like that idea, it's nicer than having the
direct-io code do its work seperately from the aio code.  Part of where I'm
going with this patch is a better integration between the block layer
(make_request), page layer (aops), and FS layer (f_ops), particularly in the
completion paths.  The direct-io code is an improvement over the common code
on that point, do_readahead() and friends all wait on individual pages to
become uptodate.  I'd like to bring some improvements from the directIO
architecture into use in the common case, which I hope will help
performance.

I know that might seem somewhat unrelated, but I don't think it is.  This
change goes hand in hand with using completion handlers in the aops.  That
will link together the completion callback in the bio with the aio callback,
so that the whole stack can finish its work in one context.


That said, you can't make kiocb private without completely removing the
ability of the rest of the kernel to complete an aio sanely from irq context.
You need some form of i/o descriptor, and a kiocb is just that.  Adding more
layering is just going to make things messier and slower for no real gain.


This patchset does not change how or when I/O completion happens,
aio_complete() will still get called from direct-io.c, nfs-direct.c, et al. 
The iocb structure is still passed to aio_complete, just like before.  The

only difference is that the lower level code doesn't know that it's got an
iocb, all it sees is an opaque cookie.  It's more like enforcing a layer
that's already in place, and I think things got simpler rather than messier. 
Whether things are slower or not remains to be seen, but I expect no

measurable changes either way with this patch.

I'm releasing a new version of the patch soon, it will use a new iodesc
structure to keep track of iovec state, which simplifies things further.  It
also will have a new version of the usb gadget code, and some general
cleanups.  I hope you'll take a look at it.

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH: 2.6.20-rc4-mm1] JFS: Avoid deadlock introduced by explicit I/O plugging

2007-01-17 Thread Dave Kleikamp
Jens,
Can you please take a look at this patch, and if you think it's sane,
add it to your explicit i/o plugging patchset?  Would it make sense in
any of these paths to use io_schedule() instead of schedule()?

I hadn't looked at your patchset until I discovered that jfs was easy to
hang in the -mm kernel.  I think jfs may be able to add explicit
plugging and unplugging in a couple of places, but I'd like to fix the
hang right away and take my time with any later patches.

Thanks,
Shaggy

JFS: Avoid deadlock introduced by explicit I/O plugging

jfs is pretty easy to deadlock with Jens' explicit i/o plugging patchset.
Just try building a kernel.

The problem occurs when a synchronous transaction initiates some I/O, then
waits in lmGroupCommit for the transaction to be committed to the journal.
This requires action by the commit thread, which ends up waiting on a page
to complete writeback.  The commit thread did not initiate the I/O, so it
cannot unplug the io queue, and deadlock occurs.

The fix is for the first thread to call blk_replug_current_nested() before
going to sleep.  This patch also adds the call to a couple other places that
look like they need it.

Signed-off-by: Dave Kleikamp <[EMAIL PROTECTED]>

diff -Nurp linux-2.6.20-rc4-mm1/fs/jfs/jfs_lock.h linux/fs/jfs/jfs_lock.h
--- linux-2.6.20-rc4-mm1/fs/jfs/jfs_lock.h  2006-11-29 15:57:37.0 
-0600
+++ linux/fs/jfs/jfs_lock.h 2007-01-17 15:30:19.0 -0600
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * jfs_lock.h
@@ -42,6 +43,7 @@ do {  \
if (cond)   \
break;  \
unlock_cmd; \
+   blk_replug_current_nested();\
schedule(); \
lock_cmd;   \
}   \
diff -Nurp linux-2.6.20-rc4-mm1/fs/jfs/jfs_metapage.c 
linux/fs/jfs/jfs_metapage.c
--- linux-2.6.20-rc4-mm1/fs/jfs/jfs_metapage.c  2007-01-12 09:50:45.0 
-0600
+++ linux/fs/jfs/jfs_metapage.c 2007-01-17 15:28:46.0 -0600
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "jfs_incore.h"
 #include "jfs_superblock.h"
 #include "jfs_filsys.h"
@@ -56,6 +57,7 @@ static inline void __lock_metapage(struc
set_current_state(TASK_UNINTERRUPTIBLE);
if (metapage_locked(mp)) {
unlock_page(mp->page);
+   blk_replug_current_nested();
schedule();
lock_page(mp->page);
}
diff -Nurp linux-2.6.20-rc4-mm1/fs/jfs/jfs_txnmgr.c linux/fs/jfs/jfs_txnmgr.c
--- linux-2.6.20-rc4-mm1/fs/jfs/jfs_txnmgr.c2007-01-12 09:50:45.0 
-0600
+++ linux/fs/jfs/jfs_txnmgr.c   2007-01-17 15:29:04.0 -0600
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "jfs_incore.h"
 #include "jfs_inode.h"
 #include "jfs_filsys.h"
@@ -135,6 +136,7 @@ static inline void TXN_SLEEP_DROP_LOCK(w
add_wait_queue(event, &wait);
set_current_state(TASK_UNINTERRUPTIBLE);
TXN_UNLOCK();
+   blk_replug_current_nested();
schedule();
current->state = TASK_RUNNING;
remove_wait_queue(event, &wait);

-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -mm 0/10][RFC] aio: make struct kiocb private

2007-01-17 Thread Benjamin LaHaise
On Mon, Jan 15, 2007 at 08:25:15PM -0800, Nate Diller wrote:
> the right thing to do from a design perspective.  Hopefully it enables
> a new architecture that can reduce context switches in I/O completion,
> and reduce overhead.  That's the real motive ;)

And it's a broken motive.  Context switches per se are not bad, as they 
make it possible to properly schedule code in a busy system (which is 
*very* important when realtime concerns come into play).  Have a look 
at how things were done in the 2.4 aio code to see how completion would 
get done with a non-retry method, typically in interrupt context.  I had 
code that did direct I/O rather differently by sharing code with the 
read/write code paths at some point, the catch being that it was pretty 
invasive, which meant that it never got merged with the changes to handle 
writeback pressure and other work that happened during 2.5.

That said, you can't make kiocb private without completely removing the 
ability of the rest of the kernel to complete an aio sanely from irq context.  
You need some form of i/o descriptor, and a kiocb is just that.  Adding more 
layering is just going to make things messier and slower for no real gain.

-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[EMAIL PROTECTED]>.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unpatched secunia advisories

2007-01-17 Thread Majkls
Hello,
some time ago I sent some open advisory there, but in fact it had been
already fixed. So I want to continue in this chase on open advisories
and closed bugs. So there is some from FS:

CVE-2006-2629 - Race condition in Linux kernel 2.6.15 to 2.6.17, when
running on SMP platforms, allows local users to cause a denial of
service (crash) by creating and exiting a large number of tasks, then
accessing the /proc entry of a task that is exiting, which causes memory
corruption that leads to a failure in the prune_dcache function or a
BUG_ON error in include/linux/list.h.
= What is status of this bug? (in all 2.6 forks)

I also suppose this bug has been fixed in all 2.6 branches allready:
CVE-2004-1235 (http://secunia.com/advisories/13756/)

What is status of this bug:
http://secunia.com/cve_reference/CVE-2004-1058/

This has been already fixed, hasn't it?
http://secunia.com/advisories/13126/
CVE-2004-1070, CVE-2004-1071, CVE-2004-1072, CVE-2004-1073

http://secunia.com/advisories/12426/ - Has already been this bug fixed?

http://secunia.com/advisories/12210/ - Linux Kernel File Offset Pointer
Handling Memory Disclosure Vulnerability. On secunia is that is fixed
only in 2.4.

If you want reply that have been already fixed, please attach link to
main git repository on kernel.org.


-- 
Miloslav "Majkls" Semler
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take33 10/10] kevent: Kevent based AIO (aio_sendfile()/aio_sendfile_path()).

2007-01-17 Thread Evgeniy Polyakov
On Wed, Jan 17, 2007 at 07:21:42PM +0530, Suparna Bhattacharya ([EMAIL 
PROTECTED]) wrote:
> 
> Since you are implementing new APIs here, have you considered doing an
> aio_sendfilev to be able to send a header with the data ?

It is doable, but why people do not like corking?
With Linux less than microsecond syscall overhead it is better and more
flexible solution, doesn't it?

I'm not saying - 'no, there will not be any *v variants', just getting
more info.

> Regards
> Suparna

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take33 10/10] kevent: Kevent based AIO (aio_sendfile()/aio_sendfile_path()).

2007-01-17 Thread Suparna Bhattacharya

Since you are implementing new APIs here, have you considered doing an
aio_sendfilev to be able to send a header with the data ?

Regards
Suparna

On Wed, Jan 17, 2007 at 09:30:35AM +0300, Evgeniy Polyakov wrote:
> 
> Kevent based AIO (aio_sendfile()/aio_sendfile_path()).
> 
> aio_sendfile()/aio_sendfile_path() contains of two major parts: AIO
> state machine and page processing code.
> The former is just a small subsystem, which allows to queue callback
> for theirs invocation in process' context on behalf of pool of kernel
> threads. It allows to queue caches of callbacks to the local thread
> or to any other specified. Each cache of callbacks is processed until
> there are callbacks in it, callbacks can requeue themselfs into the
> same cache.
> 
> Real work is being done in page processing code - code which populates
> pages into VFS cache and then sends pages to the destination socket
> via ->sendpage(). Unlike previous aio_sendfile() implementation, new
> one does not require low-level filesystem specific callbacks (->get_block())
> at all, instead I extended struct address_space_operations to contain new
> member called ->aio_readpages(), which is exactly the same as ->readpage()
> (read: mpage_readpages()) except different BIO allocation and sumbission
> routines. I changed mpage_readpages() to provide mpage_alloc() and
> mpage_bio_submit() to the new function called __mpage_readpages(), which is
> exactly old mpage_readpages() with provided callback invocation instead of
> usage for old functions. mpage_readpages_aio() provides kevent specific
> callbacks, which calls old functions, but with different destructor callbacks,
> which are essentially the same, except that they reschedule AIO processing.
> 
> aio_sendfile_path() is essentially aio_sendfile(), except that it takes
> source filename as parameter and returns opened file descriptor.
> 
> Benchmark of the 100 1MB files transfer (files are in VFS already) using sync
> sendfile() against aio_sendfile_path() shows about 10MB/sec performance win
> (78 MB/s vs 66-72 MB/s over 1 Gb network, sendfile sending server is one-way
> AMD Athlong 64 3500+) for aio_sendfile_path().
> 
> AIO state machine is a base for network AIO (which becomes
> quite trivial), but I will not start implementation until
> roadback of kevent as a whole and AIO implementation become more clear.
> 
> Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>
> 
> diff --git a/fs/bio.c b/fs/bio.c
> index 7618bcb..291e7e8 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -120,7 +120,7 @@ void bio_free(struct bio *bio, struct bio_set *bio_set)
>  /*
>   * default destructor for a bio allocated with bio_alloc_bioset()
>   */
> -static void bio_fs_destructor(struct bio *bio)
> +void bio_fs_destructor(struct bio *bio)
>  {
>   bio_free(bio, fs_bio_set);
>  }
> diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
> index beaf25f..f08c957 100644
> --- a/fs/ext3/inode.c
> +++ b/fs/ext3/inode.c
> @@ -1650,6 +1650,13 @@ ext3_readpages(struct file *file, struct address_space 
> *mapping,
>   return mpage_readpages(mapping, pages, nr_pages, ext3_get_block);
>  }
> 
> +static int
> +ext3_readpages_aio(struct file *file, struct address_space *mapping,
> + struct list_head *pages, unsigned nr_pages, void *priv)
> +{
> + return mpage_readpages_aio(mapping, pages, nr_pages, ext3_get_block, 
> priv);
> +}
> +
>  static void ext3_invalidatepage(struct page *page, unsigned long offset)
>  {
>   journal_t *journal = EXT3_JOURNAL(page->mapping->host);
> @@ -1768,6 +1775,7 @@ static int ext3_journalled_set_page_dirty(struct page 
> *page)
>  }
> 
>  static const struct address_space_operations ext3_ordered_aops = {
> + .aio_readpages  = ext3_readpages_aio,
>   .readpage   = ext3_readpage,
>   .readpages  = ext3_readpages,
>   .writepage  = ext3_ordered_writepage,
> diff --git a/fs/mpage.c b/fs/mpage.c
> index 692a3e5..e5ba44b 100644
> --- a/fs/mpage.c
> +++ b/fs/mpage.c
> @@ -102,7 +102,7 @@ static struct bio *mpage_bio_submit(int rw, struct bio 
> *bio)
>  static struct bio *
>  mpage_alloc(struct block_device *bdev,
>   sector_t first_sector, int nr_vecs,
> - gfp_t gfp_flags)
> + gfp_t gfp_flags, void *priv)
>  {
>   struct bio *bio;
> 
> @@ -116,6 +116,7 @@ mpage_alloc(struct block_device *bdev,
>   if (bio) {
>   bio->bi_bdev = bdev;
>   bio->bi_sector = first_sector;
> + bio->bi_private = priv;
>   }
>   return bio;
>  }
> @@ -175,7 +176,10 @@ map_buffer_to_page(struct page *page, struct buffer_head 
> *bh, int page_block)
>  static struct bio *
>  do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
>   sector_t *last_block_in_bio, struct buffer_head *map_bh,
> - unsigned long *first_logical_block, get_block_t get_block)
> + unsigned long *first_logical_block, get_block_t get_block,
> + struct bio *(

Re: [RFC][PATCH 0/3] ext4 online defrag (ver 0.2)

2007-01-17 Thread Takashi Sato

Hi,


On Jan 16, 2007  21:03 +0900, [EMAIL PROTECTED] wrote:

1. Add new ioctl(EXT4_IOC_DEFRAG) which returns the first physical
   block number of the specified file.  With this ioctl, a command
   gets the specified directory's.


Maybe I don't understand, but how is this different from the long-time
FIBMAP ioctl?


I can use FIBMAP instead of my new ioctl.
You are right.  I should have used FIBMAP ioctl...


struct ext4_ext_defrag_data {
loff_t start_offset; /* start offset to defrag in byte */
loff_t defrag_size;  /* size of defrag in bytes */
ext4_fsblk_t goal;   /* block offset for allocation */
};


Two things of note:
- presumably the start_offset and defrag_size should be multiples of the
 filesystem blocksize?  If they are not, is it an error or are they
 adjusted to cover whole blocks?


Given the value which isn't multiples of the blocksize,
they are adjusted to cover whole blocks in the kernel.

But I think that it isn't clean that the unit of goal is different from
start_offset and defrag_size.  I will change their unit into a blocksize
in the next update.


- in previous defrag discussions (i.e. XFS defrag), it was desirable to
 allow specifying different types of goals (e.g. hard, soft, kernel picks).
 We may as well have a structure that allows these to be specified, instead
 of having to change the interface afterward.


Let me see...  Is it the following discussion?
http://marc.theaimsgroup.com/?l=linux-ext4&m=116161490908645&w=2
http://marc.theaimsgroup.com/?l=linux-ext4&m=116184475306761&w=2

Cheers, Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html