Re: [patch 30/41] reiserfs convert to new aops.

2007-05-23 Thread Nick Piggin
On Thu, May 24, 2007 at 03:46:46AM +0400, Vladimir V. Saveliev wrote:
 Hello, Nick
 
 On Wednesday 16 May 2007 21:22, Vladimir V. Saveliev wrote:
  
  Two questions - first, is it possible to get rid of reiserfs_prepare_write /
  commit_write? 
 
 I assume this is not a problem anymore as long as prepare_write and 
 commit_write of address space operations are NULL now 
 (with last version of reiserfs-convert-to-new-aops.patch).

Yes, that should be fine and allow us to remove some of the generic code
now.

 
  Second, can you make use of AOP_FLAG_CONT_EXPAND in order to 
  get rid of the special generic_cont_expand routine for reiserfs?
 
 Sorry for delay with this. Did you mean something like this?
 
Yes I did, thanks very much, we can now get rid of that weird
generic_cont_expand too (assuming these patches will ever get
merged).

Thanks for doing this Vladimir!

 
 From: Vladimir Saveliev [EMAIL PROTECTED]
 
 This patch makes reiserfs to use AOP_FLAG_CONT_EXPAND
 in order to get rid of the special generic_cont_expand routine
  
 Signed-off-by: Vladimir Saveliev [EMAIL PROTECTED]
 
 
 
 
 diff -puN fs/reiserfs/inode.c~fs-reiserfs-use-generic_cont_expand_simple 
 fs/reiserfs/inode.c
 --- 
 linux-2.6.21-mm2/fs/reiserfs/inode.c~fs-reiserfs-use-generic_cont_expand_simple
2007-05-24 02:29:52.0 +0300
 +++ linux-2.6.21-mm2-vs/fs/reiserfs/inode.c   2007-05-24 02:34:26.0 
 +0300
 @@ -2561,13 +2561,20 @@ static int reiserfs_write_begin(struct f
   int ret;
   int old_ref = 0;
  
 + inode = mapping-host;
 + *fsdata = 0;
 + if (flags  AOP_FLAG_CONT_EXPAND 
 + (pos  (inode-i_sb-s_blocksize - 1)) == 0) {
 + pos ++;
 + *fsdata = (void *)flags;
 + }
 +
   index = pos  PAGE_CACHE_SHIFT;
   page = __grab_cache_page(mapping, index);
   if (!page)
   return -ENOMEM;
   *pagep = page;
  
 - inode = mapping-host;
   reiserfs_wait_on_write_block(inode-i_sb);
   fix_tail_page_for_writing(page);
   if (reiserfs_transaction_running(inode-i_sb)) {
 @@ -2677,6 +2684,8 @@ static int reiserfs_write_end(struct fil
   struct reiserfs_transaction_handle *th;
   unsigned start;
  
 + if ((unsigned)fsdata  AOP_FLAG_CONT_EXPAND)
 + pos ++;
  
   reiserfs_wait_on_write_block(inode-i_sb);
   if (reiserfs_transaction_running(inode-i_sb))
 @@ -3065,7 +3074,7 @@ int reiserfs_setattr(struct dentry *dent
   }
   /* fill in hole pointers in the expanding truncate case. */
   if (attr-ia_size  inode-i_size) {
 - error = generic_cont_expand(inode, attr-ia_size);
 + error = generic_cont_expand_simple(inode, 
 attr-ia_size);
   if (REISERFS_I(inode)-i_prealloc_count  0) {
   int err;
   struct reiserfs_transaction_handle th;
 
 _



Re: Announce: new-aops-1 for 2.6.21-rc3

2007-03-19 Thread Nick Piggin
On Thu, Mar 15, 2007 at 04:47:13PM -0700, Mark Fasheh wrote:
 On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote:
  (excludes the OCFS2 patch that Mark sent, in anticipation of an update)
 
 Attached is said patch. I needed to export __grab_cache_page (ext2/ext3 also
 need this if they're to be built as modules), so a patch to do that is also
 attached.
 
 This passed some preliminary testing on a two node cluster I have here at
 Oracle.

Thanks Mark, I've merged these.

Nick


Announce: new-aops-1 for 2.6.21-rc3

2007-03-15 Thread Nick Piggin
OK, I've gone through and fixed several bugs until the thing actually
survives fsx-linux for both ext2 and ext3 ordered and writeback (both
when using the new aops, and the legacy prepare_write path). Actually
ext3 sometimes breaks, but it does in unpatched kernels anyway.

At 15 patches (including the initial buffered write deadlock fixes),
it is too much to keep posting -- not much has fundamentally changed,
so I'll just post occasionally if we make big changes. The quilt
format is probably easier for someone wishing to work on it anyway.

http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/

(excludes the OCFS2 patch that Mark sent, in anticipation of an update)

It would be really nice if filesystem developers could take a look
at the new interfaces some time, because otherwise they might get stuck
with it :) So I'm cc'ing a few filesystems that come to mind, that I 
haven't heard anything from. 

Thanks,
Nick


Re: Announce: new-aops-1 for 2.6.21-rc3

2007-03-15 Thread Nick Piggin
On Thu, Mar 15, 2007 at 12:32:45PM -0700, Joel Becker wrote:
 On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote:
  At 15 patches (including the initial buffered write deadlock fixes),
  it is too much to keep posting -- not much has fundamentally changed,
  so I'll just post occasionally if we make big changes. The quilt
  format is probably easier for someone wishing to work on it anyway.
  
  http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/
 
   For future drops, can you provide the unpacked patches too, so
 lazy people like me can read them in the browser?  Thanks.

Sorry, I did intend to unpack that, but forgot. It's done now, the
new directory containing the patches is under the same URL as above.

Thanks,
Nick


Re: Announce: new-aops-1 for 2.6.21-rc3

2007-03-15 Thread Nick Piggin
On Thu, Mar 15, 2007 at 12:53:51PM -0700, Mark Fasheh wrote:
 On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote:
  OK, I've gone through and fixed several bugs until the thing actually
  survives fsx-linux for both ext2 and ext3 ordered and writeback (both
  when using the new aops, and the legacy prepare_write path). Actually
  ext3 sometimes breaks, but it does in unpatched kernels anyway.
  
  At 15 patches (including the initial buffered write deadlock fixes),
  it is too much to keep posting -- not much has fundamentally changed,
  so I'll just post occasionally if we make big changes. The quilt
  format is probably easier for someone wishing to work on it anyway.
 
 Hmm, we still left out some exports...

Thanks, applied.


reiserfs set_page_dirty vs truncate race?

2006-10-10 Thread Nick Piggin
Hi,

I wonder if resierfs needs to be careful about truncated pages, or does
your .invalidatepage provide synchronisation against set_page_dirty somehow?
--
Index: linux-2.6/fs/reiserfs/inode.c
===
--- linux-2.6.orig/fs/reiserfs/inode.c
+++ linux-2.6/fs/reiserfs/inode.c
@@ -2840,10 +2840,14 @@ static void reiserfs_invalidatepage(stru
 
 static int reiserfs_set_page_dirty(struct page *page)
 {
-   struct inode *inode = page-mapping-host;
-   if (reiserfs_file_data_log(inode)) {
-   SetPageChecked(page);
-   return __set_page_dirty_nobuffers(page);
+   struct address_space *mapping = page-mapping;
+   struct inode *inode;
+   if (mapping) {
+   inode = mapping-host;
+   if (reiserfs_file_data_log(inode)) {
+   SetPageChecked(page);
+   return __set_page_dirty_nobuffers(page);
+   }
}
return __set_page_dirty_buffers(page);
 }



Re: reiserfs set_page_dirty vs truncate race?

2006-10-10 Thread Nick Piggin
On Tue, Oct 10, 2006 at 06:29:27PM +0400, Vladimir V. Saveliev wrote:
 Hello
 
 On Tuesday 10 October 2006 11:44, Nick Piggin wrote:
  Hi,
  
  I wonder if resierfs needs to be careful about truncated pages, 
 
 I think it is assumed that __set_page_dirty_nobuffers and 
 __set_page_dirty_buffers care about truncated pages.

So they may operate on truncated pages? Then you probably need the fix.
Anyway, if you would like to merge it, you may add a
Signed-off-by: Nick Piggin [EMAIL PROTECTED]

Thanks,
Nick


[patch] reiser4 releasepage fix

2006-04-15 Thread Nick Piggin
Hi Andrew,

Can you put the following in -mm please? I cc'ed Hans about it earlier.
Basically, resier4 does not present a good reason why pagecache should
be dropped here. Basically the exact same operations are performed by
releasepage callers (reclaim or truncate). If there is some atomicity
problem, the code removed here wouldn't solve it anyway.

And unless reiser4 performs some of its own special synchronisation, I
can't see how it gets the pagecache removal sequence right for the
reclaim case (ie. careful checks of page_count and PageDirty under
tree_lock).

With this patch in place, reiser4-reget-page-mapping.patch can go.
I would also be worried about exporting remove_from_page_cache... at
least the __ variant can stop being exported now.

--
Index: linux-2.6/fs/reiser4/as_ops.c
===
--- linux-2.6.orig/fs/reiser4/as_ops.c
+++ linux-2.6/fs/reiser4/as_ops.c
@@ -358,11 +358,6 @@ int reiser4_releasepage(struct page *pag
assert(reiser4-4, page-mapping != NULL);
assert(reiser4-5, page-mapping-host != NULL);
 
-   /* is_page_cache_freeable() check
-  (mapping + private + page_cache_get() by shrink_cache()) */
-   if (page_count(page)  3)
-   return 0;
-
if (PageDirty(page))
return 0;
 
@@ -385,14 +380,6 @@ int reiser4_releasepage(struct page *pag
/* we are under memory pressure so release jnode also. */
jput(node);
 
-   write_lock_irq(mapping-tree_lock);
-   /* shrink_list() + radix-tree */
-   if (page_count(page) == 2) {
-   __remove_from_page_cache(page);
-   atomic_dec(page-_count);
-   }
-   write_unlock_irq(mapping-tree_lock);
-
return 1;
} else {
spin_unlock((node-load));


Re: [patch] reiser4 releasepage fix

2006-04-15 Thread Nick Piggin
On Sat, Apr 15, 2006 at 10:57:58AM +0200, Nick Piggin wrote:
 Hi Andrew,
 
 Can you put the following in -mm please? I cc'ed Hans about it earlier.

I did test it under reclaim pressure, btw. Seems to be working OK.


Re: [patch] reiser4 releasepage fix

2006-04-15 Thread Nick Piggin
On Sat, Apr 15, 2006 at 06:17:33PM +0400, Alexander Zarochentsev wrote:
 hello,
 
 On Saturday 15 April 2006 12:57, Nick Piggin wrote:
  Hi Andrew,
 
  Can you put the following in -mm please? I cc'ed Hans about it
  earlier. Basically, resier4 does not present a good reason why
  pagecache should be dropped here. 
 
 correct. it duplicates remove_mapping().
 
 we tested similar patch recently and found no problems.

Oh good :)

  --
  Index: linux-2.6/fs/reiser4/as_ops.c
  ===
  --- linux-2.6.orig/fs/reiser4/as_ops.c
  +++ linux-2.6/fs/reiser4/as_ops.c
  @@ -358,11 +358,6 @@ int reiser4_releasepage(struct page *pag
  assert(reiser4-4, page-mapping != NULL);
  assert(reiser4-5, page-mapping-host != NULL);
 
 
 Except my patch didn't remove the page ref count check from 
 reiser4_releasepage.
 
 Nick, does the check conflict with the lockless pagecache patches?  
 Reiser4 just tries to avoid unnecessary detaching private page objects 
 if the page_count doesn't allow page removal.

Well the check is racy anyway because if the page is in pagecache,
then there is nothing to prevent find_get_page from running (eg.
page fault, or read(2)).

So strictly, no, the check does not conflict with lockless pagecache.
It is simply nice to cut down the number of places where it isn't
immediately obvious.

I'd also like to think about removing page_count from outside mm/, in
favour of a pagecache API which gives us more confidence that the
caller is sane and has taken correct locks, etc.

Anway, if you can ack this for Andrew, I'm sure he'd sleep a bit
better (as would I)!


 
  -   /* is_page_cache_freeable() check
  -  (mapping + private + page_cache_get() by shrink_cache()) */
  -   if (page_count(page)  3)
  -   return 0;
  -
  if (PageDirty(page))
  return 0;
 

BTW. the PageDirty check is also racy, so it can probably go too. I
didn't have the courage to remove it because I don't know whether
your page dirtying functions try to somehow synchronise against this.

Thanks,
Nick


elevators (was Re: I request inclusion of reiser4 in the mainline kernel)

2005-09-20 Thread Nick Piggin
CC list trimmed.

On Tue, 2005-09-20 at 00:59 -0700, Hans Reiser wrote:
 Nick Piggin wrote:
 
 On Mon, 2005-09-19 at 23:28 -0700, Hans Reiser wrote:
   
 

 Well the terminology changed to io scheduler now, however the
 residual elevator name found in places doesn't cause anyone
 any problems and there isn't much reason to change it other than
 the desire to break things.
   
 
 Did you really say that?I mean, come on, can't you at least manage a
 well, it ought to get changed but I am busy with something more
 exciting to me.
 

In a perfect world maybe it should be changed, however I don't
know what out of tree drivers rely on the interface and I really
don't care to find out. Simple as that.

 Ask Nate about this after he gets an ok from the customer to disclose
 his work.  It is not so simple as you claim.
 

Nate, I would be very interested to hear about your work if
and when you are able to disclose it.

[snip devfs]

Yeah I was just trying to introduce some humour to the thread!
Or maybe deflate one flamewar by starting another :)

Nick

-- 
SUSE Labs, Novell Inc.



Send instant messages to your online friends http://au.messenger.yahoo.com 



Re: I request inclusion of reiser4 in the mainline kernel

2005-09-19 Thread Nick Piggin

Hans Reiser wrote:


So why is the code in the kernel so hard to read then?

Linux kernel code is getting better, and Andrew Morton's code is
especially good, but for the most part it's unnecessarily hard to read. 
Look at the elevator code for instance.  Ugh.





What's wrong with the elevator code?

The elevator code was one of the first things I got involved with
as a complete kernel newbie, and I was able to follow the current
code well enough to make a new IO scheduler, and extend the
elevator API sufficiently to provide the fairly unique
capabilities I needed.

If it is the elevator *API* you are worried about, that is fairly
trivial and well documented by Jens and myself in
Documentation/block/biodoc.txt, along with an overview of some key
ideas useful for iosched implementors.

as-iosched.c itself is IMO reasonably well commented (at least
the non-trivial, non-boilerplate functions). That is not to say it
is trivial to understand because it is a fairly complex state
machine and heuristics, but at less than 2000 lines of very well
contained code it is not an impossible task to understand it.

If that is too much for you, noop-iosched.c implements a fully
working io scheduler in exactly 94 lines, including whitespace and
comments.

What are your specific concerns? I would be interested in helping
to fix any valid ones you have.

Thanks,
Nick


Send instant messages to your online friends http://au.messenger.yahoo.com 



Re: reiser4 plugins

2005-06-27 Thread Nick Piggin

Markus Törnqvist wrote:

 I can't find the original post I'm thinking about but
 http://lkml.org/lkml/2005/5/16/68 says essentially the same thing.

The scheduler is being improved for better behaviour on complex
topologies like multi core + NUMA and multi level NUMA systems.
If Con's work had gone in first, then conversely these improvements
would have had to wait.


There's also my all-time favorite, http://lkml.org/lkml/2005/3/14/4



What's wrong with that? The slowdown is due to the workload
becoming disk bound. The reasons are still not entirely clear,
but I don't think it is a recent (ie. 2.6) regression (or even
a regression at all IIRC).


The lack of QA seems appalling here, and I'm sure Reiser has had
to do more of that for DARPA than most linux kernel hackers around.



And what QA would you have preferred?

I think if you are resorting to bringing up all time favourite
blunders when trying to justify Reiser4 being included, then
that is a sign right there that something is fundamentally wrong
(if not with the code, then with your line of thought0

And note my email has nothing to do with any *real* argument for
or against R4.

Thanks,
Nick

Send instant messages to your online friends http://au.messenger.yahoo.com 



Re: reiser4 plugins

2005-06-27 Thread Nick Piggin

Markus Törnqvist wrote:

On Mon, Jun 27, 2005 at 07:46:15PM +1000, Nick Piggin wrote:


The scheduler is being improved for better behaviour on complex
topologies like multi core + NUMA and multi level NUMA systems.
If Con's work had gone in first, then conversely these improvements
would have had to wait.



Or be merged in later.



It _was_ merged later. Con was talking about Andrew's -mm tree.


The problem is, why do the interfaces have to live so much that
examples like this come to my ears (or eyes ;) all the time?



The scheduler interfaces are probably among the most stable
in the kernel. Con was talking about internal implementation.


I hate to say this without digging out any URLs, but one friend
of mine says he has a very hard time doing any networking code
because it's too labile. Maybe that's being embettered for something
else too?



It seems there are plenty of people who can do networking code
just fine.


Or the other friend who curses that the networking code is just
crap and basically has to rewrite the code to get it working.
Yes, I've tried to get these guys to submit their code, but they
argue back that no one wants to see it.



I get the feeling your friend is making up some tall tales if he
says he rewrote the networking code to be much better than it is
now. But I can guarantee him that if he has it will be of great
interest to the network developers.




There's also my all-time favorite, http://lkml.org/lkml/2005/3/14/4


What's wrong with that? The slowdown is due to the workload
becoming disk bound. The reasons are still not entirely clear,
but I don't think it is a recent (ie. 2.6) regression (or even
a regression at all IIRC).



It's not that easy.

So let's say the scheduler slowdown is a temporary situation to
adapt it to multicore+NUMA.
I assume that's what you mean, by having the kernels on par
with 2.6.10 and the above paragraph.


I'm pretty sure it isn't a scheduler slowdown, but rather the
system is getting disk bound. I think the reporter has
performance back to 2.6.10 levels - it is actually something I
still have to follow up on, but it is difficult because the
reporter isn't able to share his code, although he's otherwise
very helpful.



Why on earth did the slowdown ever get merged and we have to wait
for it to be on par with some older version?



If slowdowns do get merged, it is generally because they either
haven't been reported until being merged or they make an accepted
tradeoff. Not that they haven't been tested.

Let's just get this straight, no amount of QA other than releasing
a kernel and asking people to use it is going to find all the bugs
that will be encountered.



Please do correct me if I'm wrong :)



I mostly agree with you apart from your specific examples I think
we could do things better, and most of us have made some big blunders,
However there is just no way that bringing any of that up will help
Reiser4 being merged.

Send instant messages to your online friends http://au.messenger.yahoo.com