Re: page_launder() bug

2001-05-14 Thread Marcelo Tosatti



On Sun, 13 May 2001, Linus Torvalds wrote:

> 
> On Sun, 13 May 2001, Rik van Riel wrote:
> > 
> > This means that the swapin path (and the same path for
> > other pagecache pages) doesn't take the page lock and
> > the page lock doesn't protect us from other people using
> > the page while we have it locked.
> 
> You can test for swap cache deadness without holding the page cache lock:
> if the swap count is 1, then we know that nobody else has this swap entry
> in its page tables, and thus there can not be any concurrent lookups
> either.
> 
> Now, it may well be that we need to make sure that there is some proper
> ordering (nobody must decrement the swap count before they increment the
> page count or something). I think that is the case anyway (and I _think_
> that everybody that mucks with the swap count always hold the page count -
> this might be a good thing to check).

Swapin readahead _first_ increases the swap map count for the given page
and then increases the page count.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-14 Thread Kai Henningsen

[EMAIL PROTECTED] (Linus Torvalds)  wrote on 13.05.01 in 
<[EMAIL PROTECTED]>:

> And that's why I'd rather have generic support for _any_ page mapping that
> wants to drop pages early than have specific logic for swapping.
> Historically, we've always had very good results from trying to avoid
> having special cases and instead trying to find what the common rules
> might be.

Just a thought: isn't a dead swap page a rather similar condition to a  
page in a file without any links, open file descriptors, or open mmaps? In  
both cases, writeback really makes no sense.

MfG Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-14 Thread Kai Henningsen

[EMAIL PROTECTED] (Linus Torvalds)  wrote on 13.05.01 in 
[EMAIL PROTECTED]:

 And that's why I'd rather have generic support for _any_ page mapping that
 wants to drop pages early than have specific logic for swapping.
 Historically, we've always had very good results from trying to avoid
 having special cases and instead trying to find what the common rules
 might be.

Just a thought: isn't a dead swap page a rather similar condition to a  
page in a file without any links, open file descriptors, or open mmaps? In  
both cases, writeback really makes no sense.

MfG Kai
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-14 Thread Marcelo Tosatti



On Sun, 13 May 2001, Linus Torvalds wrote:

 
 On Sun, 13 May 2001, Rik van Riel wrote:
  
  This means that the swapin path (and the same path for
  other pagecache pages) doesn't take the page lock and
  the page lock doesn't protect us from other people using
  the page while we have it locked.
 
 You can test for swap cache deadness without holding the page cache lock:
 if the swap count is 1, then we know that nobody else has this swap entry
 in its page tables, and thus there can not be any concurrent lookups
 either.
 
 Now, it may well be that we need to make sure that there is some proper
 ordering (nobody must decrement the swap count before they increment the
 page count or something). I think that is the case anyway (and I _think_
 that everybody that mucks with the swap count always hold the page count -
 this might be a good thing to check).

Swapin readahead _first_ increases the swap map count for the given page
and then increases the page count.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread Rik van Riel

On Sun, 13 May 2001, Linus Torvalds wrote:
> On Sun, 13 May 2001, Rik van Riel wrote:
> > 
> > Why the hell would we want this ?
> You've missed about half the discussion, it seems..

True, I was away at a conference ;)

> > If the page is referenced, it should be moved back to the
> > active list and should never be a candidate for writeout.
> 
> Wrong.
> 
> There are
>  (a) dead swap pages, where it doesn't matter one _whit_ whether it is
>  referenced or not, because we know with 100% certainty that nobody
>  will ever reference it again. This _may_ be true in other cases too,
>  but we know it is true for swap pages that have lost all references.
>  (b) filesystems and memory allocators that might want to get feedback on
>  the fact that we're even _looking_ at their pages, and that we're
>  aging them down. They might easily use these things for starting
>  background activity like deciding to close the logs..
> 
> The high-level VM layer simply doesn't have that kind of information.

Agreed.  I'd like to make sure, however, that we keep the
high-level VM cleanly separated from the lower layers so
we can keep the VM maintainable and predictable...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread Linus Torvalds


On Sun, 13 May 2001, Rik van Riel wrote:
> 
> This means that the swapin path (and the same path for
> other pagecache pages) doesn't take the page lock and
> the page lock doesn't protect us from other people using
> the page while we have it locked.

You can test for swap cache deadness without holding the page cache lock:
if the swap count is 1, then we know that nobody else has this swap entry
in its page tables, and thus there can not be any concurrent lookups
either.

Now, it may well be that we need to make sure that there is some proper
ordering (nobody must decrement the swap count before they increment the
page count or something). I think that is the case anyway (and I _think_
that everybody that mucks with the swap count always hold the page count -
this might be a good thing to check).

So while you're right in general, there are some things we can do with
less global locking. Just holding the page lock will guarantee that at
least nobody changes the index etc from under us, so that the things we
test are at least fairly well-defined.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread David S. Miller


Rik van Riel writes:
 > Then I'd rather check this in a visible place in page_launder()
 > itself. Granted, this is a special case, but I don't think this
 > one is worth obfuscating the code for...

I think Linus's scheme is just fine, controlling the new 'priority'
argument to writepage() using the referenced bit as an input.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread Rik van Riel

On Sun, 13 May 2001, David S. Miller wrote:
> Rik van Riel writes:
>  > On Tue, 8 May 2001, David S. Miller wrote:
>  > > Nice.  Now the only bit left is moving the referenced bit
>  > > checking and/or state into writepage as well.  This is still
>  > > part of the plan right?
>  > 
>  > Why the hell would we want this ?
> 
> Because if it's a dead swap page the referenced bit is meaningless
> and we should just kill off the page immediately.

Then I'd rather check this in a visible place in page_launder()
itself. Granted, this is a special case, but I don't think this
one is worth obfuscating the code for...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread David S. Miller


Rik van Riel writes:
 > On Tue, 8 May 2001, David S. Miller wrote:
 > > Nice.  Now the only bit left is moving the referenced bit
 > > checking and/or state into writepage as well.  This is still
 > > part of the plan right?
 > 
 > Why the hell would we want this ?

Because if it's a dead swap page the referenced bit is meaningless
and we should just kill off the page immediately.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread Rik van Riel

On Tue, 8 May 2001, David S. Miller wrote:
> Marcelo Tosatti writes:
>  > Ok, this patch implements thet thing and also changes ext2+swap+shm
>  > writepage operations (so I could test the thing).
>  > 
>  > The performance is better with the patch on my restricted swapping tests.
> 
> Nice.  Now the only bit left is moving the referenced bit
> checking and/or state into writepage as well.  This is still
> part of the plan right?

Why the hell would we want this ?

If the page is referenced, it should be moved back to the
active list and should never be a candidate for writeout.

I'm very happy we got rid of the horribly intertwined VM
code in 2.2 and went to a more separated design in 2.4...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread Rik van Riel

On Tue, 8 May 2001, Mikulas Patocka wrote:

> > +   if (!dead_swap_page &&
> > +   (PageTestandClearReferenced(page) || page->age > 0 ||
> > +(!page->buffers && page_count(page) > 1) ||
> > +page_ramdisk(page))) {
>   ^^
> > del_page_from_inactive_dirty_list(page);
> > add_page_to_active_list(page);
> > continue;
> 
> #define page_ramdisk(page) \
> (page->buffers && (MAJOR(page->buffers->b_dev) == RAMDISK_MAJOR))
> 
> Are you sure that no one will release buffers under your hands?

Two things can happen:

1) the page gets ramdisk buffers _after_ we look at it first,
   in this case the page isn't freeable and will be moved to
   the active list on the next page_launder() loop

2) the page loses its ramdisk buffers after we look at it,
   now the page is freeable, but we won't see it again until
   it is moved from the active list to the inactive_dirty
   list again

Any side effects harmful enough to warrant complicating this
test ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread Rik van Riel

On Mon, 7 May 2001, Linus Torvalds wrote:

> Which you MUST NOT do without holding the page lock.
> 
> Hint: it needs "page->index", and without holding the page lock you
> don't know what it could be.

Wouldn't that be the pagecache_lock ?

Remember that the semantics for find_swap_page() and
friends got changed recently to first test PageUptodate
and only try to lock the page if that didn't work out.

This means that the swapin path (and the same path for
other pagecache pages) doesn't take the page lock and
the page lock doesn't protect us from other people using
the page while we have it locked.

What _does_ protect us, however, is the fact that
reclaim_page() grabs the pagecache_lock...

[OTOH, I could be out to lunch here since I was away for
almost a week and haven't checked if the discussed changes
really were integrated into the kernel]

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread Rik van Riel

On Mon, 7 May 2001, Linus Torvalds wrote:

 Which you MUST NOT do without holding the page lock.
 
 Hint: it needs page-index, and without holding the page lock you
 don't know what it could be.

Wouldn't that be the pagecache_lock ?

Remember that the semantics for find_swap_page() and
friends got changed recently to first test PageUptodate
and only try to lock the page if that didn't work out.

This means that the swapin path (and the same path for
other pagecache pages) doesn't take the page lock and
the page lock doesn't protect us from other people using
the page while we have it locked.

What _does_ protect us, however, is the fact that
reclaim_page() grabs the pagecache_lock...

[OTOH, I could be out to lunch here since I was away for
almost a week and haven't checked if the discussed changes
really were integrated into the kernel]

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread Linus Torvalds


On Sun, 13 May 2001, Rik van Riel wrote:
 
 This means that the swapin path (and the same path for
 other pagecache pages) doesn't take the page lock and
 the page lock doesn't protect us from other people using
 the page while we have it locked.

You can test for swap cache deadness without holding the page cache lock:
if the swap count is 1, then we know that nobody else has this swap entry
in its page tables, and thus there can not be any concurrent lookups
either.

Now, it may well be that we need to make sure that there is some proper
ordering (nobody must decrement the swap count before they increment the
page count or something). I think that is the case anyway (and I _think_
that everybody that mucks with the swap count always hold the page count -
this might be a good thing to check).

So while you're right in general, there are some things we can do with
less global locking. Just holding the page lock will guarantee that at
least nobody changes the index etc from under us, so that the things we
test are at least fairly well-defined.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread Rik van Riel

On Tue, 8 May 2001, Mikulas Patocka wrote:

  +   if (!dead_swap_page 
  +   (PageTestandClearReferenced(page) || page-age  0 ||
  +(!page-buffers  page_count(page)  1) ||
  +page_ramdisk(page))) {
   ^^
  del_page_from_inactive_dirty_list(page);
  add_page_to_active_list(page);
  continue;
 
 #define page_ramdisk(page) \
 (page-buffers  (MAJOR(page-buffers-b_dev) == RAMDISK_MAJOR))
 
 Are you sure that no one will release buffers under your hands?

Two things can happen:

1) the page gets ramdisk buffers _after_ we look at it first,
   in this case the page isn't freeable and will be moved to
   the active list on the next page_launder() loop

2) the page loses its ramdisk buffers after we look at it,
   now the page is freeable, but we won't see it again until
   it is moved from the active list to the inactive_dirty
   list again

Any side effects harmful enough to warrant complicating this
test ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread Rik van Riel

On Tue, 8 May 2001, David S. Miller wrote:
 Marcelo Tosatti writes:
   Ok, this patch implements thet thing and also changes ext2+swap+shm
   writepage operations (so I could test the thing).
   
   The performance is better with the patch on my restricted swapping tests.
 
 Nice.  Now the only bit left is moving the referenced bit
 checking and/or state into writepage as well.  This is still
 part of the plan right?

Why the hell would we want this ?

If the page is referenced, it should be moved back to the
active list and should never be a candidate for writeout.

I'm very happy we got rid of the horribly intertwined VM
code in 2.2 and went to a more separated design in 2.4...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread David S. Miller


Rik van Riel writes:
  On Tue, 8 May 2001, David S. Miller wrote:
   Nice.  Now the only bit left is moving the referenced bit
   checking and/or state into writepage as well.  This is still
   part of the plan right?
  
  Why the hell would we want this ?

Because if it's a dead swap page the referenced bit is meaningless
and we should just kill off the page immediately.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread Rik van Riel

On Sun, 13 May 2001, David S. Miller wrote:
 Rik van Riel writes:
   On Tue, 8 May 2001, David S. Miller wrote:
Nice.  Now the only bit left is moving the referenced bit
checking and/or state into writepage as well.  This is still
part of the plan right?
   
   Why the hell would we want this ?
 
 Because if it's a dead swap page the referenced bit is meaningless
 and we should just kill off the page immediately.

Then I'd rather check this in a visible place in page_launder()
itself. Granted, this is a special case, but I don't think this
one is worth obfuscating the code for...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread Rik van Riel

On Sun, 13 May 2001, Linus Torvalds wrote:
 On Sun, 13 May 2001, Rik van Riel wrote:
  
  Why the hell would we want this ?
 You've missed about half the discussion, it seems..

True, I was away at a conference ;)

  If the page is referenced, it should be moved back to the
  active list and should never be a candidate for writeout.
 
 Wrong.
 
 There are
  (a) dead swap pages, where it doesn't matter one _whit_ whether it is
  referenced or not, because we know with 100% certainty that nobody
  will ever reference it again. This _may_ be true in other cases too,
  but we know it is true for swap pages that have lost all references.
  (b) filesystems and memory allocators that might want to get feedback on
  the fact that we're even _looking_ at their pages, and that we're
  aging them down. They might easily use these things for starting
  background activity like deciding to close the logs..
 
 The high-level VM layer simply doesn't have that kind of information.

Agreed.  I'd like to make sure, however, that we keep the
high-level VM cleanly separated from the lower layers so
we can keep the VM maintainable and predictable...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-13 Thread David S. Miller


Rik van Riel writes:
  Then I'd rather check this in a visible place in page_launder()
  itself. Granted, this is a special case, but I don't think this
  one is worth obfuscating the code for...

I think Linus's scheme is just fine, controlling the new 'priority'
argument to writepage() using the referenced bit as an input.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-10 Thread Anuradha Ratnaweera


On Mon, 7 May 2001, J . A . Magallon wrote:

> 
> On 05.07 Helge Hafting wrote:
> >
> > !0 is 1.  !(anything else) is 0.  It is zero and one, not
> > zero and "non-zero".  So a !! construction gives zero if you have
> > zero, and one if you had anything else.  There's no doubt about it.
> > >
> 
> Isn't this asking for trouble with the optimizer ? It could kill both
> !!. Using that is like trusting on a certain struct padding-alignment.
> 

It isn't, or rather it can't. Because !!x is not x unless x is one or
zero.

Anuradha

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-10 Thread Ingo Oeser

On Tue, May 08, 2001 at 09:52:15AM +0200, Helge Hafting wrote:
> > Isn't this asking for trouble with the optimizer ? It could kill both
> > !!. Using that is like trusting on a certain struct padding-alignment.
> 
> No, this won't cause trouble with the optimizer, because the
> optimizer isn't supposed to do _wrong_ things.
 
Right. The optimizer proves equivalence of terms and exchange the
one that are bad for the optimization goal (e.g performance,
speed, size) against the one that works more towards this goal.

Everything else is an optimizer BUG, which should be reported and
fixed.

The C{89,99} standard now defines the syntax and semantics of
theses terms. 

Relevant for the optimizer: possible values of terms, assumptions
made on the static and dynamic behavior of these terms (add
anything I forgot).

So the optimizer should NEVER cause trouble if you write
completely valid C{89,99} and the compiler and environment
implement 100% of the semantics of it.

Compiler specific features should be seen as an addition to the
standard on this compiler. They follow the same rules stated
above.

Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag 
  been there and had much fun   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-10 Thread Anuradha Ratnaweera


On Mon, 7 May 2001, J . A . Magallon wrote:

 
 On 05.07 Helge Hafting wrote:
 
  !0 is 1.  !(anything else) is 0.  It is zero and one, not
  zero and non-zero.  So a !! construction gives zero if you have
  zero, and one if you had anything else.  There's no doubt about it.
  
 
 Isn't this asking for trouble with the optimizer ? It could kill both
 !!. Using that is like trusting on a certain struct padding-alignment.
 

It isn't, or rather it can't. Because !!x is not x unless x is one or
zero.

Anuradha

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-10 Thread Ingo Oeser

On Tue, May 08, 2001 at 09:52:15AM +0200, Helge Hafting wrote:
  Isn't this asking for trouble with the optimizer ? It could kill both
  !!. Using that is like trusting on a certain struct padding-alignment.
 
 No, this won't cause trouble with the optimizer, because the
 optimizer isn't supposed to do _wrong_ things.
 
Right. The optimizer proves equivalence of terms and exchange the
one that are bad for the optimization goal (e.g performance,
speed, size) against the one that works more towards this goal.

Everything else is an optimizer BUG, which should be reported and
fixed.

The C{89,99} standard now defines the syntax and semantics of
theses terms. 

Relevant for the optimizer: possible values of terms, assumptions
made on the static and dynamic behavior of these terms (add
anything I forgot).

So the optimizer should NEVER cause trouble if you write
completely valid C{89,99} and the compiler and environment
implement 100% of the semantics of it.

Compiler specific features should be seen as an addition to the
standard on this compiler. They follow the same rules stated
above.

Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag http://www.tu-chemnitz.de/linux/tag
  been there and had much fun   
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-09 Thread Marcelo Tosatti


On Wed, 9 May 2001, David S. Miller wrote:

> 
> Marcelo Tosatti writes:
>  > > Let me state it a different way, how is the new writepage() framework
>  > > going to do things like ignore the referenced bit during page_launder
>  > > for dead swap pages?
>  > 
>  > Its not able to ignore the referenced bit. 
>  > 
>  > I know we want that, but I can't see any clean way of doing that. 
> 
> Unfortunately, one way would involve pushing the referenced bit check
> into the writepage() method.  But that is the only place we have the
> kind of information necessary to make these decisions.
> 
> This is exactly the kind of issue Linus and myself were talking
> about when the "cost analysis" parts of thie discussion began.

I'm not convinced that moving the referenced bit check inside writepage()
is worth it. We will duplicate code which is purely a VM job (ie check the
referenced bit and possibly remove the page from the inactive dirty list
and add it to the active list, _with_ the pagemap_lru_lock held), not a
pager job.

The PageDirty bit handling is different IMHO --- the pager is the one who
knows if it actually wrote the page or not. 










-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-09 Thread Marcelo Tosatti



On Wed, 9 May 2001, David S. Miller wrote:

> 
> Marcelo Tosatti writes:
>  > You want writepage() to check/clean the referenced bit and move the page
>  > to the active list itself ?
> 
> Well, that's the other part of what my patch was doing.
> 
> Let me state it a different way, how is the new writepage() framework
> going to do things like ignore the referenced bit during page_launder
> for dead swap pages?

Its not able to ignore the referenced bit. 

I know we want that, but I can't see any clean way of doing that. 

Suggestions ? 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-09 Thread David S. Miller


Marcelo Tosatti writes:
 > You want writepage() to check/clean the referenced bit and move the page
 > to the active list itself ?

Well, that's the other part of what my patch was doing.

Let me state it a different way, how is the new writepage() framework
going to do things like ignore the referenced bit during page_launder
for dead swap pages?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-09 Thread Marcelo Tosatti


On Tue, 8 May 2001, David S. Miller wrote:

> 
> Marcelo Tosatti writes:
>  > Ok, this patch implements thet thing and also changes ext2+swap+shm
>  > writepage operations (so I could test the thing).
>  > 
>  > The performance is better with the patch on my restricted swapping tests.
> 
> Nice.  Now the only bit left is moving the referenced bit
> checking and/or state into writepage as well.  This is still
> part of the plan right?

Hum...

You want writepage() to check/clean the referenced bit and move the page
to the active list itself ?


 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-09 Thread Martin Dalecki

Rusty Russell wrote:
> 
> In message <[EMAIL PROTECTED]> you write:
> >
> > Jonathan Morton writes:
> >  > >-  page_count(page) == (1 + !!page->buffers));
> >  >
> >  > Two inversions in a row?
> >
> > It is the most straightforward way to make a '1' or '0'
> > integer from the NULL state of a pointer.
> 
> Overall, I'd have to say that this:
> 
> -   dead_swap_page =
> -   (PageSwapCache(page) &&
> -page_count(page) == (1 + !!page->buffers));
> -
> 
> Is nicer as:
> 
> int dead_swap_page = 0;
> 
> if (PageSwapCache(page)
> && page_count(page) == (page->buffers ? 1 : 2))
> dead_swap_page = 1;
> 
> After all, the second is what the code *means* (1 and 2 are magic
> numbers).
> 
> That said, anyone who doesn't understand the former should probably
> get some more C experience before commenting on others' code...

Basically Amen.

But there are may be better chances that the compiler does do
better job at branch prediction in the second case? 
Wenn anyway objdump -S should show it...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-09 Thread David S. Miller


Marcelo Tosatti writes:
  You want writepage() to check/clean the referenced bit and move the page
  to the active list itself ?

Well, that's the other part of what my patch was doing.

Let me state it a different way, how is the new writepage() framework
going to do things like ignore the referenced bit during page_launder
for dead swap pages?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-09 Thread Marcelo Tosatti



On Wed, 9 May 2001, David S. Miller wrote:

 
 Marcelo Tosatti writes:
   You want writepage() to check/clean the referenced bit and move the page
   to the active list itself ?
 
 Well, that's the other part of what my patch was doing.
 
 Let me state it a different way, how is the new writepage() framework
 going to do things like ignore the referenced bit during page_launder
 for dead swap pages?

Its not able to ignore the referenced bit. 

I know we want that, but I can't see any clean way of doing that. 

Suggestions ? 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-09 Thread Marcelo Tosatti


On Wed, 9 May 2001, David S. Miller wrote:

 
 Marcelo Tosatti writes:
Let me state it a different way, how is the new writepage() framework
going to do things like ignore the referenced bit during page_launder
for dead swap pages?
   
   Its not able to ignore the referenced bit. 
   
   I know we want that, but I can't see any clean way of doing that. 
 
 Unfortunately, one way would involve pushing the referenced bit check
 into the writepage() method.  But that is the only place we have the
 kind of information necessary to make these decisions.
 
 This is exactly the kind of issue Linus and myself were talking
 about when the cost analysis parts of thie discussion began.

I'm not convinced that moving the referenced bit check inside writepage()
is worth it. We will duplicate code which is purely a VM job (ie check the
referenced bit and possibly remove the page from the inactive dirty list
and add it to the active list, _with_ the pagemap_lru_lock held), not a
pager job.

The PageDirty bit handling is different IMHO --- the pager is the one who
knows if it actually wrote the page or not. 










-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-09 Thread Martin Dalecki

Rusty Russell wrote:
 
 In message [EMAIL PROTECTED] you write:
 
  Jonathan Morton writes:
-  page_count(page) == (1 + !!page-buffers));
   
Two inversions in a row?
 
  It is the most straightforward way to make a '1' or '0'
  integer from the NULL state of a pointer.
 
 Overall, I'd have to say that this:
 
 -   dead_swap_page =
 -   (PageSwapCache(page) 
 -page_count(page) == (1 + !!page-buffers));
 -
 
 Is nicer as:
 
 int dead_swap_page = 0;
 
 if (PageSwapCache(page)
  page_count(page) == (page-buffers ? 1 : 2))
 dead_swap_page = 1;
 
 After all, the second is what the code *means* (1 and 2 are magic
 numbers).
 
 That said, anyone who doesn't understand the former should probably
 get some more C experience before commenting on others' code...

Basically Amen.

But there are may be better chances that the compiler does do
better job at branch prediction in the second case? 
Wenn anyway objdump -S should show it...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-09 Thread Marcelo Tosatti


On Tue, 8 May 2001, David S. Miller wrote:

 
 Marcelo Tosatti writes:
   Ok, this patch implements thet thing and also changes ext2+swap+shm
   writepage operations (so I could test the thing).
   
   The performance is better with the patch on my restricted swapping tests.
 
 Nice.  Now the only bit left is moving the referenced bit
 checking and/or state into writepage as well.  This is still
 part of the plan right?

Hum...

You want writepage() to check/clean the referenced bit and move the page
to the active list itself ?


 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Jonathan Morton

>That said, anyone who doesn't understand the former should probably
>get some more C experience before commenting on others' code...

I understood it, but it looked very much like a typo.

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Rusty Russell

In message <[EMAIL PROTECTED]> you write:
> 
> Jonathan Morton writes:
>  > >-  page_count(page) == (1 + !!page->buffers));
>  > 
>  > Two inversions in a row?
> 
> It is the most straightforward way to make a '1' or '0'
> integer from the NULL state of a pointer.

Overall, I'd have to say that this:

-   dead_swap_page =
-   (PageSwapCache(page) &&
-page_count(page) == (1 + !!page->buffers));
-

Is nicer as:

int dead_swap_page = 0;

if (PageSwapCache(page)
&& page_count(page) == (page->buffers ? 1 : 2))
dead_swap_page = 1;

After all, the second is what the code *means* (1 and 2 are magic
numbers).

That said, anyone who doesn't understand the former should probably
get some more C experience before commenting on others' code...

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread David S. Miller


Marcelo Tosatti writes:
 > Ok, this patch implements thet thing and also changes ext2+swap+shm
 > writepage operations (so I could test the thing).
 > 
 > The performance is better with the patch on my restricted swapping tests.

Nice.  Now the only bit left is moving the referenced bit
checking and/or state into writepage as well.  This is still
part of the plan right?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Marcelo Tosatti



On Tue, 8 May 2001, Linus Torvalds wrote:

> 
> 
> On Tue, 8 May 2001, Marcelo Tosatti wrote:
> >
> > There are two issues which I missed yesterday: we have to get a reference
> > on the page, mark it clean, drop the locks and then call writepage(). If
> > the writepage() fails, we'll have to set_page_dirty(page).
> 
> We can move the "mark it clean" into writepage, which would actually
> simplify the error cases for shared memory writepage (no need to mark it
> dirty again etc).
> 
> > I guess this is too much overhead for the common case, don't you?
> 
> You could easily be right.
> 
> On the other hand, remember that a noticeable part of the time you should
> be seeing a real write too, so the CPU overhead compared to the IO might
> not be prohibitive. Ie, let's assuem that 10% of the time we actually end
> up doing writes, then that 10% is going to be _soo_ much more than the
> extra 10 cycles 90% of the time that the cleanup may well be worth it.
> 
> Especially if the cleanup means that we can avoid doing some of the real
> writes altogether, by being better able to release dead memory to the
> system.

Ok, this patch implements thet thing and also changes ext2+swap+shm
writepage operations (so I could test the thing).

The performance is better with the patch on my restricted swapping tests.

In case you don't have any problems with this I'll fix the other
writepage's (so tell me if its ok for you).


diff -Nur --exclude-from=exclude linux.orig/fs/buffer.c linux/fs/buffer.c
--- linux.orig/fs/buffer.c  Mon May  7 20:47:26 2001
+++ linux/fs/buffer.c   Tue May  8 22:04:00 2001
@@ -1933,12 +1933,17 @@
return err;
 }
 
-int block_write_full_page(struct page *page, get_block_t *get_block)
+int block_write_full_page(struct page *page, get_block_t *get_block, int priority)
 {
struct inode *inode = page->mapping->host;
unsigned long end_index = inode->i_size >> PAGE_CACHE_SHIFT;
unsigned offset;
int err;
+
+   if (!priority)
+   return -1;
+
+   ClearPageDirty(page);
 
/* easy case */
if (page->index < end_index)
diff -Nur --exclude-from=exclude linux.orig/fs/ext2/inode.c linux/fs/ext2/inode.c
--- linux.orig/fs/ext2/inode.c  Mon May  7 20:47:26 2001
+++ linux/fs/ext2/inode.c   Tue May  8 20:46:54 2001
@@ -650,9 +650,9 @@
return NULL;
 }
 
-static int ext2_writepage(struct page *page)
+static int ext2_writepage(struct page *page, int priority)
 {
-   return block_write_full_page(page,ext2_get_block);
+   return block_write_full_page(page,ext2_get_block,priority);
 }
 static int ext2_readpage(struct file *file, struct page *page)
 {
diff -Nur --exclude-from=exclude linux.orig/include/linux/fs.h linux/include/linux/fs.h
--- linux.orig/include/linux/fs.h   Tue May  8 16:45:42 2001
+++ linux/include/linux/fs.hTue May  8 22:22:38 2001
@@ -362,7 +362,7 @@
 struct address_space;
 
 struct address_space_operations {
-   int (*writepage)(struct page *);
+   int (*writepage)(struct page *, int);
int (*readpage)(struct file *, struct page *);
int (*sync_page)(struct page *);
int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
@@ -1268,7 +1268,7 @@
 /* Generic buffer handling for block filesystems.. */
 extern int block_flushpage(struct page *, unsigned long);
 extern int block_symlink(struct inode *, const char *, int);
-extern int block_write_full_page(struct page*, get_block_t*);
+extern int block_write_full_page(struct page*, get_block_t*, int);
 extern int block_read_full_page(struct page*, get_block_t*);
 extern int block_prepare_write(struct page*, unsigned, unsigned, get_block_t*);
 extern int cont_prepare_write(struct page*, unsigned, unsigned, get_block_t*,
diff -Nur --exclude-from=exclude linux.orig/mm/filemap.c linux/mm/filemap.c
--- linux.orig/mm/filemap.c Mon May  7 20:47:26 2001
+++ linux/mm/filemap.c  Tue May  8 22:22:50 2001
@@ -411,7 +411,7 @@
  */
 void filemap_fdatasync(struct address_space * mapping)
 {
-   int (*writepage)(struct page *) = mapping->a_ops->writepage;
+   int (*writepage)(struct page *, int) = mapping->a_ops->writepage;
 
spin_lock(_lock);
 
@@ -430,8 +430,7 @@
lock_page(page);
 
if (PageDirty(page)) {
-   ClearPageDirty(page);
-   writepage(page);
+   writepage(page, 1);
} else
UnlockPage(page);
 
diff -Nur --exclude-from=exclude linux.orig/mm/shmem.c linux/mm/shmem.c
--- linux.orig/mm/shmem.c   Mon May  7 20:47:26 2001
+++ linux/mm/shmem.cTue May  8 22:23:01 2001
@@ -221,13 +221,16 @@
  * once.  We still need to guard against racing with
  * shmem_getpage_locked().  
  */
-static int shmem_writepage(struct page * page)
+static int shmem_writepage(struct page * page, int priority)
 {
int error = 0;
struct shmem_inode_info *info;
swp_entry_t 

Re: page_launder() bug

2001-05-08 Thread Linus Torvalds



On Tue, 8 May 2001, Marcelo Tosatti wrote:
>
> There are two issues which I missed yesterday: we have to get a reference
> on the page, mark it clean, drop the locks and then call writepage(). If
> the writepage() fails, we'll have to set_page_dirty(page).

We can move the "mark it clean" into writepage, which would actually
simplify the error cases for shared memory writepage (no need to mark it
dirty again etc).

> I guess this is too much overhead for the common case, don't you?

You could easily be right.

On the other hand, remember that a noticeable part of the time you should
be seeing a real write too, so the CPU overhead compared to the IO might
not be prohibitive. Ie, let's assuem that 10% of the time we actually end
up doing writes, then that 10% is going to be _soo_ much more than the
extra 10 cycles 90% of the time that the cleanup may well be worth it.

Especially if the cleanup means that we can avoid doing some of the real
writes altogether, by being better able to release dead memory to the
system.

Tradeoffs..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Kai Henningsen

[EMAIL PROTECTED] (Horst von Brand)  wrote on 07.05.01 in 
<[EMAIL PROTECTED]>:

> "David S. Miller" <[EMAIL PROTECTED]> said:
> > Jonathan Morton writes:
> >  > >-page_count(page) == (1 + !!page->buffers));
> >  >
> >  > Two inversions in a row?
> >
> > It is the most straightforward way to make a '1' or '0'
> > integer from the NULL state of a pointer.
>
> IMVHO, it is clearer to write:
>
>   page_count(page) == 1 + (page->buffers != NULL)
>
> At least, the original poster wouldn't have wondered, and I wouldn't have
> had to think a bit to find out what it meant... If gcc generates worse code
> for this, it should be fixed.

Huh. IMO, that is significantly *less* readable. And incidentally I'd be  
less certain that it actually does what you want - it is rather easy to  
convince yourself that !! has to do the right thing.

MfG Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Mikulas Patocka

>  > My point is that its _ok_ for us to check if the page is a dead swap cache
>  > page _without_ the lock since writepage() will recheck again with the page
>  > _locked_. Quoting you two messages back: 
>  > 
>  > "But it is important to re-calculate the deadness after getting the lock.
>  > Before, it was just an informed guess. After the lock, it is knowledge."
>  > 
>  > See ? 
> 
> In fact my patch isn't changing writepage behavior wrt. that page, it
> is changing behavior with respect to laundering policy for that page.
> 
> Here, let's talk code a little bit so there are no misunderstandings,
> I really want to put this to rest:
> 
> + int dead_swap_page;
> +
>   page = list_entry(page_lru, struct page, lru);
>  
> + dead_swap_page =
> + (PageSwapCache(page) &&
> +  page_count(page) == (1 + !!page->buffers));
> +
> 
> Calculate dead_swap_page outside of lock.
> 
>   /* Page is or was in use?  Move it to the active list. */
> - if (PageTestandClearReferenced(page) || page->age > 0 ||
> - (!page->buffers && page_count(page) > 1) ||
> - page_ramdisk(page)) {
 ^
> + if (!dead_swap_page &&
> + (PageTestandClearReferenced(page) || page->age > 0 ||
> +  (!page->buffers && page_count(page) > 1) ||
> +  page_ramdisk(page))) {
^^
>   del_page_from_inactive_dirty_list(page);
>   add_page_to_active_list(page);
>   continue;

#define page_ramdisk(page) \
(page->buffers && (MAJOR(page->buffers->b_dev) == RAMDISK_MAJOR))

Are you sure that no one will release buffers under your hands?

Mikulas


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Marcelo Tosatti



On Mon, 7 May 2001, Linus Torvalds wrote:

> In fact, it might even clean stuff up. Who knows? At least
> page_launder() would not need to know about magic dead swap pages, because
> the decision would be entirely in writepage().
> 
> And there aren't that many writepage() implementations in the kernel to
> fix up (and the fixup tends to be two simple added lines of code for most
> of them - just the "if (!priority) return").
> 
> Also note how some filesystems might use the writepage() callback even
> with a zero priority as a hint that things are approaching the point where
> we need to start flushing, which might make a difference for deciding when
> to try to write a log entry, for example.

Moreover, the filesystem may want to return "-1" even if "priority" is
non-zero --- think about delayed allocations. (the XFS guys were just
complaining about page_launder() not checking the return value of
writepage() so they could not do this kind of thing with delayed
allocations sometime ago).

> Now, I'm not saying this is _the_ solution to it, but I don't see any
> really clean alternatives.

I like it --- it pushes down control to the pagers so they can be smarter.

Will send a patch later.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread BERECZ Szabolcs

On Mon, 7 May 2001, David S. Miller wrote:

> My patch is crap and can cause corruptions, there is not argument
> about it now :-)
is it the only bug in the swap handling?
or why is this bug triggered so heavily if the swap is on a filesystem?
I had oopses when I used a swapfile on a partition, but that was really
rare. I even don't think it's becouse page_launder().
so what's so different if the swap sits on a filesystem?

Bye,
Szabi


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread David S. Miller


Linus Torvalds writes:
 > Maybe it's academic. Do we know that any of this actually makes any
 > performance difference at all?

We know that dirty swap pages can accumulate to the point where the
swapper starves before it gets to enough of the "second pass" cases of
the page_launder loop to run in order to get rid of them.

That was the behavior I saw that let me to hacking up my broken patch.

Even simple "memory hog" test programs can trigger this behavior.
Just write a program which "grows almost completely into swap then
subsides" over and over again.

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Helge Hafting

"J . A . Magallon" wrote:
> 
> On 05.07 Helge Hafting wrote:

> > !0 is 1.  !(anything else) is 0.  It is zero and one, not
> > zero and "non-zero".  So a !! construction gives zero if you have
> > zero, and one if you had anything else.  There's no doubt about it.
> > >
> 
> Isn't this asking for trouble with the optimizer ? It could kill both
> !!. Using that is like trusting on a certain struct padding-alignment.

No, this won't cause trouble with the optimizer, because the
optimizer isn't supposed to do _wrong_ things.

The optimizer will not remove two !! in a row, simply because that
_isn't_ a valid optimization as you just have seen.  
"!" is a logical not, it isn't a bitwise not.  The result of
!!(something)
is different from just (something) whenever (something) is
neither 0 or 1, the optimizer knows that very well.

Use gcc -S to get readable assembly.
Try these yourself, they compile to different code:
int main(int argc){return argc;}
int main(int argc){return !!argc;}
They have to be different, as argc >= 2 gives different results.

And try:
int main(int argc){return !argc;}
int main(int argc){return !!!argc;}

These gives the same code with or without optimization with
gcc 2.95.4 on i386, as they are equivalent.  The first ! normalize
to 1 or 0, the two others are then removeable.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Linus Torvalds


On Mon, 7 May 2001, David S. Miller wrote:
> 
> The only downside would be that the formerly "quick case" in the loop
> of dealing with referenced pages would now need to go inside the page
> lock.  It's probably a non-issue...

It might easily be an issue. That function will touch pretty much every
single page that we ever want to free, and it might be worthwhile to know
what the pressure is.

However, the point is probably moot. I found a problem with my approach:
using writepage() to try to get rid of swap cache pages early on (ie not
doing the "if it is accessed, put it back on the list" thing early)
doesn't work all that well: it doesn't handle the case of _clean_
swap-cache pages at all. And those can be quite common, although usually
not in the simple benchmarks which just dirty as quickly as they can.

[ The way to get a clean swap-cache page is to dirty it early in the
  process lifetime, and then use the page read-only later on over
  time. Maybe it's not common enough to worry about. ]

Ho humm. 

Maybe we just can't avoid special-casing the swap cache in page_launder,
and letting it know about things like swap counts (well, we obviously
_can_ avoid the special casing, as that's what it does now. But we might
be losing out by not doing so - right now we at least avoid doing
unnecessary writes, but we might be trying too hard to free memory that
really _should_ be more easily free'd).

Maybe it's academic. Do we know that any of this actually makes any
performance difference at all?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Kai Henningsen

[EMAIL PROTECTED] (Horst von Brand)  wrote on 07.05.01 in 
[EMAIL PROTECTED]:

 David S. Miller [EMAIL PROTECTED] said:
  Jonathan Morton writes:
-page_count(page) == (1 + !!page-buffers));
   
Two inversions in a row?
 
  It is the most straightforward way to make a '1' or '0'
  integer from the NULL state of a pointer.

 IMVHO, it is clearer to write:

   page_count(page) == 1 + (page-buffers != NULL)

 At least, the original poster wouldn't have wondered, and I wouldn't have
 had to think a bit to find out what it meant... If gcc generates worse code
 for this, it should be fixed.

Huh. IMO, that is significantly *less* readable. And incidentally I'd be  
less certain that it actually does what you want - it is rather easy to  
convince yourself that !! has to do the right thing.

MfG Kai
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Mikulas Patocka

   My point is that its _ok_ for us to check if the page is a dead swap cache
   page _without_ the lock since writepage() will recheck again with the page
   _locked_. Quoting you two messages back: 
   
   But it is important to re-calculate the deadness after getting the lock.
   Before, it was just an informed guess. After the lock, it is knowledge.
   
   See ? 
 
 In fact my patch isn't changing writepage behavior wrt. that page, it
 is changing behavior with respect to laundering policy for that page.
 
 Here, let's talk code a little bit so there are no misunderstandings,
 I really want to put this to rest:
 
 + int dead_swap_page;
 +
   page = list_entry(page_lru, struct page, lru);
  
 + dead_swap_page =
 + (PageSwapCache(page) 
 +  page_count(page) == (1 + !!page-buffers));
 +
 
 Calculate dead_swap_page outside of lock.
 
   /* Page is or was in use?  Move it to the active list. */
 - if (PageTestandClearReferenced(page) || page-age  0 ||
 - (!page-buffers  page_count(page)  1) ||
 - page_ramdisk(page)) {
 ^
 + if (!dead_swap_page 
 + (PageTestandClearReferenced(page) || page-age  0 ||
 +  (!page-buffers  page_count(page)  1) ||
 +  page_ramdisk(page))) {
^^
   del_page_from_inactive_dirty_list(page);
   add_page_to_active_list(page);
   continue;

#define page_ramdisk(page) \
(page-buffers  (MAJOR(page-buffers-b_dev) == RAMDISK_MAJOR))

Are you sure that no one will release buffers under your hands?

Mikulas


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Linus Torvalds



On Tue, 8 May 2001, Marcelo Tosatti wrote:

 There are two issues which I missed yesterday: we have to get a reference
 on the page, mark it clean, drop the locks and then call writepage(). If
 the writepage() fails, we'll have to set_page_dirty(page).

We can move the mark it clean into writepage, which would actually
simplify the error cases for shared memory writepage (no need to mark it
dirty again etc).

 I guess this is too much overhead for the common case, don't you?

You could easily be right.

On the other hand, remember that a noticeable part of the time you should
be seeing a real write too, so the CPU overhead compared to the IO might
not be prohibitive. Ie, let's assuem that 10% of the time we actually end
up doing writes, then that 10% is going to be _soo_ much more than the
extra 10 cycles 90% of the time that the cleanup may well be worth it.

Especially if the cleanup means that we can avoid doing some of the real
writes altogether, by being better able to release dead memory to the
system.

Tradeoffs..

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Helge Hafting

J . A . Magallon wrote:
 
 On 05.07 Helge Hafting wrote:

  !0 is 1.  !(anything else) is 0.  It is zero and one, not
  zero and non-zero.  So a !! construction gives zero if you have
  zero, and one if you had anything else.  There's no doubt about it.
  
 
 Isn't this asking for trouble with the optimizer ? It could kill both
 !!. Using that is like trusting on a certain struct padding-alignment.

No, this won't cause trouble with the optimizer, because the
optimizer isn't supposed to do _wrong_ things.

The optimizer will not remove two !! in a row, simply because that
_isn't_ a valid optimization as you just have seen.  
! is a logical not, it isn't a bitwise not.  The result of
!!(something)
is different from just (something) whenever (something) is
neither 0 or 1, the optimizer knows that very well.

Use gcc -S to get readable assembly.
Try these yourself, they compile to different code:
int main(int argc){return argc;}
int main(int argc){return !!argc;}
They have to be different, as argc = 2 gives different results.

And try:
int main(int argc){return !argc;}
int main(int argc){return !!!argc;}

These gives the same code with or without optimization with
gcc 2.95.4 on i386, as they are equivalent.  The first ! normalize
to 1 or 0, the two others are then removeable.

Helge Hafting
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread BERECZ Szabolcs

On Mon, 7 May 2001, David S. Miller wrote:

 My patch is crap and can cause corruptions, there is not argument
 about it now :-)
is it the only bug in the swap handling?
or why is this bug triggered so heavily if the swap is on a filesystem?
I had oopses when I used a swapfile on a partition, but that was really
rare. I even don't think it's becouse page_launder().
so what's so different if the swap sits on a filesystem?

Bye,
Szabi


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Marcelo Tosatti



On Tue, 8 May 2001, Linus Torvalds wrote:

 
 
 On Tue, 8 May 2001, Marcelo Tosatti wrote:
 
  There are two issues which I missed yesterday: we have to get a reference
  on the page, mark it clean, drop the locks and then call writepage(). If
  the writepage() fails, we'll have to set_page_dirty(page).
 
 We can move the mark it clean into writepage, which would actually
 simplify the error cases for shared memory writepage (no need to mark it
 dirty again etc).
 
  I guess this is too much overhead for the common case, don't you?
 
 You could easily be right.
 
 On the other hand, remember that a noticeable part of the time you should
 be seeing a real write too, so the CPU overhead compared to the IO might
 not be prohibitive. Ie, let's assuem that 10% of the time we actually end
 up doing writes, then that 10% is going to be _soo_ much more than the
 extra 10 cycles 90% of the time that the cleanup may well be worth it.
 
 Especially if the cleanup means that we can avoid doing some of the real
 writes altogether, by being better able to release dead memory to the
 system.

Ok, this patch implements thet thing and also changes ext2+swap+shm
writepage operations (so I could test the thing).

The performance is better with the patch on my restricted swapping tests.

In case you don't have any problems with this I'll fix the other
writepage's (so tell me if its ok for you).


diff -Nur --exclude-from=exclude linux.orig/fs/buffer.c linux/fs/buffer.c
--- linux.orig/fs/buffer.c  Mon May  7 20:47:26 2001
+++ linux/fs/buffer.c   Tue May  8 22:04:00 2001
@@ -1933,12 +1933,17 @@
return err;
 }
 
-int block_write_full_page(struct page *page, get_block_t *get_block)
+int block_write_full_page(struct page *page, get_block_t *get_block, int priority)
 {
struct inode *inode = page-mapping-host;
unsigned long end_index = inode-i_size  PAGE_CACHE_SHIFT;
unsigned offset;
int err;
+
+   if (!priority)
+   return -1;
+
+   ClearPageDirty(page);
 
/* easy case */
if (page-index  end_index)
diff -Nur --exclude-from=exclude linux.orig/fs/ext2/inode.c linux/fs/ext2/inode.c
--- linux.orig/fs/ext2/inode.c  Mon May  7 20:47:26 2001
+++ linux/fs/ext2/inode.c   Tue May  8 20:46:54 2001
@@ -650,9 +650,9 @@
return NULL;
 }
 
-static int ext2_writepage(struct page *page)
+static int ext2_writepage(struct page *page, int priority)
 {
-   return block_write_full_page(page,ext2_get_block);
+   return block_write_full_page(page,ext2_get_block,priority);
 }
 static int ext2_readpage(struct file *file, struct page *page)
 {
diff -Nur --exclude-from=exclude linux.orig/include/linux/fs.h linux/include/linux/fs.h
--- linux.orig/include/linux/fs.h   Tue May  8 16:45:42 2001
+++ linux/include/linux/fs.hTue May  8 22:22:38 2001
@@ -362,7 +362,7 @@
 struct address_space;
 
 struct address_space_operations {
-   int (*writepage)(struct page *);
+   int (*writepage)(struct page *, int);
int (*readpage)(struct file *, struct page *);
int (*sync_page)(struct page *);
int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
@@ -1268,7 +1268,7 @@
 /* Generic buffer handling for block filesystems.. */
 extern int block_flushpage(struct page *, unsigned long);
 extern int block_symlink(struct inode *, const char *, int);
-extern int block_write_full_page(struct page*, get_block_t*);
+extern int block_write_full_page(struct page*, get_block_t*, int);
 extern int block_read_full_page(struct page*, get_block_t*);
 extern int block_prepare_write(struct page*, unsigned, unsigned, get_block_t*);
 extern int cont_prepare_write(struct page*, unsigned, unsigned, get_block_t*,
diff -Nur --exclude-from=exclude linux.orig/mm/filemap.c linux/mm/filemap.c
--- linux.orig/mm/filemap.c Mon May  7 20:47:26 2001
+++ linux/mm/filemap.c  Tue May  8 22:22:50 2001
@@ -411,7 +411,7 @@
  */
 void filemap_fdatasync(struct address_space * mapping)
 {
-   int (*writepage)(struct page *) = mapping-a_ops-writepage;
+   int (*writepage)(struct page *, int) = mapping-a_ops-writepage;
 
spin_lock(pagecache_lock);
 
@@ -430,8 +430,7 @@
lock_page(page);
 
if (PageDirty(page)) {
-   ClearPageDirty(page);
-   writepage(page);
+   writepage(page, 1);
} else
UnlockPage(page);
 
diff -Nur --exclude-from=exclude linux.orig/mm/shmem.c linux/mm/shmem.c
--- linux.orig/mm/shmem.c   Mon May  7 20:47:26 2001
+++ linux/mm/shmem.cTue May  8 22:23:01 2001
@@ -221,13 +221,16 @@
  * once.  We still need to guard against racing with
  * shmem_getpage_locked().  
  */
-static int shmem_writepage(struct page * page)
+static int shmem_writepage(struct page * page, int priority)
 {
int error = 0;
struct shmem_inode_info *info;
swp_entry_t *entry, swap;
struct inode 

Re: page_launder() bug

2001-05-08 Thread Rusty Russell

In message [EMAIL PROTECTED] you write:
 
 Jonathan Morton writes:
   -  page_count(page) == (1 + !!page-buffers));
   
   Two inversions in a row?
 
 It is the most straightforward way to make a '1' or '0'
 integer from the NULL state of a pointer.

Overall, I'd have to say that this:

-   dead_swap_page =
-   (PageSwapCache(page) 
-page_count(page) == (1 + !!page-buffers));
-

Is nicer as:

int dead_swap_page = 0;

if (PageSwapCache(page)
 page_count(page) == (page-buffers ? 1 : 2))
dead_swap_page = 1;

After all, the second is what the code *means* (1 and 2 are magic
numbers).

That said, anyone who doesn't understand the former should probably
get some more C experience before commenting on others' code...

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Jonathan Morton

That said, anyone who doesn't understand the former should probably
get some more C experience before commenting on others' code...

I understood it, but it looked very much like a typo.

--
from: Jonathan Chromatix Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread David S. Miller


Marcelo Tosatti writes:
  Ok, this patch implements thet thing and also changes ext2+swap+shm
  writepage operations (so I could test the thing).
  
  The performance is better with the patch on my restricted swapping tests.

Nice.  Now the only bit left is moving the referenced bit
checking and/or state into writepage as well.  This is still
part of the plan right?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread David S. Miller


Linus Torvalds writes:
  Maybe it's academic. Do we know that any of this actually makes any
  performance difference at all?

We know that dirty swap pages can accumulate to the point where the
swapper starves before it gets to enough of the second pass cases of
the page_launder loop to run in order to get rid of them.

That was the behavior I saw that let me to hacking up my broken patch.

Even simple memory hog test programs can trigger this behavior.
Just write a program which grows almost completely into swap then
subsides over and over again.

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Linus Torvalds


On Mon, 7 May 2001, David S. Miller wrote:
 
 The only downside would be that the formerly quick case in the loop
 of dealing with referenced pages would now need to go inside the page
 lock.  It's probably a non-issue...

It might easily be an issue. That function will touch pretty much every
single page that we ever want to free, and it might be worthwhile to know
what the pressure is.

However, the point is probably moot. I found a problem with my approach:
using writepage() to try to get rid of swap cache pages early on (ie not
doing the if it is accessed, put it back on the list thing early)
doesn't work all that well: it doesn't handle the case of _clean_
swap-cache pages at all. And those can be quite common, although usually
not in the simple benchmarks which just dirty as quickly as they can.

[ The way to get a clean swap-cache page is to dirty it early in the
  process lifetime, and then use the page read-only later on over
  time. Maybe it's not common enough to worry about. ]

Ho humm. 

Maybe we just can't avoid special-casing the swap cache in page_launder,
and letting it know about things like swap counts (well, we obviously
_can_ avoid the special casing, as that's what it does now. But we might
be losing out by not doing so - right now we at least avoid doing
unnecessary writes, but we might be trying too hard to free memory that
really _should_ be more easily free'd).

Maybe it's academic. Do we know that any of this actually makes any
performance difference at all?

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-08 Thread Marcelo Tosatti



On Mon, 7 May 2001, Linus Torvalds wrote:

 In fact, it might even clean stuff up. Who knows? At least
 page_launder() would not need to know about magic dead swap pages, because
 the decision would be entirely in writepage().
 
 And there aren't that many writepage() implementations in the kernel to
 fix up (and the fixup tends to be two simple added lines of code for most
 of them - just the if (!priority) return).
 
 Also note how some filesystems might use the writepage() callback even
 with a zero priority as a hint that things are approaching the point where
 we need to start flushing, which might make a difference for deciding when
 to try to write a log entry, for example.

Moreover, the filesystem may want to return -1 even if priority is
non-zero --- think about delayed allocations. (the XFS guys were just
complaining about page_launder() not checking the return value of
writepage() so they could not do this kind of thing with delayed
allocations sometime ago).

 Now, I'm not saying this is _the_ solution to it, but I don't see any
 really clean alternatives.

I like it --- it pushes down control to the pagers so they can be smarter.

Will send a patch later.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Linus Torvalds


On Mon, 7 May 2001, Marcelo Tosatti wrote:
> 
> So what about moving the check for a dead swap cache page from
> swap_writepage() to page_launder() (+ PageSwapCache() check) just before
> the "if (!launder_loop)" ? 
> 
> Yes, its ugly special casing. Any other suggestion ? 

My most favourite approach by far is to just remove the magic for
different writepage's altogether, and just unconditionally do a
writepage. But passing in enough information so that the writepage can
come to the right decision.

So take the old code, and remove the code that does

if (!launder_loop) {
.. move to head ..
continue;
}
writepage(page);

and instead make it do something like

if (writepage(page, launderloop)) {
.. not able to write, move to head ..
continue;
}

where "launderloop" is passed in to the writepage function as a priority.

Then, a regular "writepage()" would just do

if (!priority)
return -1;
.. do real writepage here ..
return 0;

while the special casing of dead swap cache entries can be moved into
swap_writepage() (which would do the above _too_, but would first try to
just drop the page if it can tell that the page is dead).

Now, add in the whole "accessed recently" as part of the priority (ie a
page that had the PG_referenced bit set will always get a "priority 0",
even if we could have laundered it, because we don't actually want to
write it out all that badly), and I think you'll get to where David wanted
to get _without_ any special cases.

In fact, it might even clean stuff up. Who knows? At least
page_launder() would not need to know about magic dead swap pages, because
the decision would be entirely in writepage().

And there aren't that many writepage() implementations in the kernel to
fix up (and the fixup tends to be two simple added lines of code for most
of them - just the "if (!priority) return").

Also note how some filesystems might use the writepage() callback even
with a zero priority as a hint that things are approaching the point where
we need to start flushing, which might make a difference for deciding when
to try to write a log entry, for example.

Now, I'm not saying this is _the_ solution to it, but I don't see any
really clean alternatives.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Marcelo Tosatti writes:
 > > Hmmm, can't this happen without my patch?
 > 
 > No. We will never call writepage() without __GFP_IO without your patch.
 > 

I see, because launder_loop never progresses to 1 in that case.

My patch is crap and can cause corruptions, there is not argument
about it now :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Marcelo Tosatti


On Mon, 7 May 2001, David S. Miller wrote:

> 
> Marcelo Tosatti writes:
>  > I just thought about this case:
>  >   
>  > We find a dead swap cache page, so dead_swap_page goes to 1.
>  > 
>  > We call swap_writepage(), but in the meantime the swapin readahead code   
>  > got a reference on the swap map for the page.
>  > 
>  > We write the page out because "(swap_count(page) > 1)", and we may
>  > not have __GFP_IO set in the gfp_mask. Boom.
> 
> Hmmm, can't this happen without my patch?

No. We will never call writepage() without __GFP_IO without your patch.

> Nothing stops people from getting references to the page
> between the "Page is or was in use?" test and the line
> which does "TryLockPage(page)".

I don't see any problem with people getting a reference to the page there.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Linus Torvalds


On Mon, 7 May 2001, David S. Miller wrote:
> 
> Marcelo Tosatti writes:
>  > I was wrong. The patch is indeed buggy because of the __GFP_IO thing.
> 
> What about the __GFP_IO thing?
> 
> Specifically, what protects the __GFP_IO thing from happening without
> my patch?

This:

/* First time through? Move it to the back of the list */
if (!launder_loop) {
list_del(page_lru);
list_add(page_lru, _dirty_list);
UnlockPage(page);
continue;
}


if (can_get_io_locks && !launder_loop && free_shortage()) {
launder_loop = 1;
...


Notice? If "can_get_io_locks" is not true, we will never call
"writepage()" on the page, because if "can_get_io_locks" is false,
"launder_loop" will always be false too..

And "can_get_io_locks" depends on __GFP_IO.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Linus Torvalds


On Mon, 7 May 2001, David S. Miller wrote:
> 
> Marcelo Tosatti writes:
>  > I just thought about this case:
>  >   
>  > We find a dead swap cache page, so dead_swap_page goes to 1.
>  > 
>  > We call swap_writepage(), but in the meantime the swapin readahead code   
>  > got a reference on the swap map for the page.
>  > 
>  > We write the page out because "(swap_count(page) > 1)", and we may
>  > not have __GFP_IO set in the gfp_mask. Boom.

Yes. That looks a lot easier to trigger than my "slow memory
leak" schenario.

> Hmmm, can't this happen without my patch?

No. The old code would never try to write anything if __GFP_IO wasn't set,
because "launder_loop" would never become non-zero.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Marcelo Tosatti writes:
 > I was wrong. The patch is indeed buggy because of the __GFP_IO thing.

What about the __GFP_IO thing?

Specifically, what protects the __GFP_IO thing from happening without
my patch?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Marcelo Tosatti writes:
 > I just thought about this case:
 >   
 > We find a dead swap cache page, so dead_swap_page goes to 1.
 > 
 > We call swap_writepage(), but in the meantime the swapin readahead code   
 > got a reference on the swap map for the page.
 > 
 > We write the page out because "(swap_count(page) > 1)", and we may
 > not have __GFP_IO set in the gfp_mask. Boom.

Hmmm, can't this happen without my patch?

Nothing stops people from getting references to the page
between the "Page is or was in use?" test and the line
which does "TryLockPage(page)".

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Linus Torvalds writes:
 > YOUR HEURISTIC IS WRONG!

Please start the conversation this way next time.

 > I call that a bug. You don't. Fine.

You made it sound like a data corrupter, a kernel crasher, and that
any bug against a kernel with that patch indicates my patch caused it.
There is an important distinction between "this is doing something
silly" and "this will scramble your disk and crash the kernel".

The latter is the conclusion several people came to.

And I wanted a clarification on this, nothing more.

I wanted this clarification from you _BECAUSE_ the original posting in
this thread saw data corruption which went away after reverting my
patch.  But there is no possible connection between my patch and the
crashes he saw.

Many people have already concluded that "Linus says davem's patch is
crap so it must have been causing the corruptions of the original
reporter."

Like you and I, most people are lazy.  And a lazy person is likely
to treat "bug goes away with reverting patch" plus "Linus says the
patch is crap" as "get rid of the patch, this'll fix the bug".  And
that'll be the end of it, nobody will investigate the true cause of
the problem until it shows up again later for some different reason.

 > But that code isn't coming anywhere _close_ to my tree until the two
 > match. And I stand by my assertion that it should be reverted from Alans
 > tree too.

I have no intent to ever submit that patch to your tree, I know it's
not the right way to solve this problem.

In fact I submitted that patch to you as an "[RFC]" a long time ago
and it fell on deaf ears.  Or perhaps you happened to have laser eye
surgery that particular day, who knows. :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Linus Torvalds


On Mon, 7 May 2001, David S. Miller wrote:
> 
> Here, let's talk code a little bit so there are no misunderstandings,
> I really want to put this to rest:
> 
> Calculate dead_swap_page outside of lock.

NO. That's not what you're doing at all. You're calculating something
completely different that "dead swap page". You're calculating "do we have
a swap cache entry that is not mapped into any virtual memory"?

> If dead_swap_page, ignore referenced bit heuristics.

Which is complete crap. Those reference bits are valid and important
data. You have not computed anything that says otherwise. You have
computed a random number that doesn't tell you anything about whether the
page is dead or not. 

> Really, what does this have to do with swap counts and page counts?
> 
> It's a heuristic. In fact it even seems stupid to me to recalculate
> dead_swap_page after we get the lock just for the sake of these
> heuristics.

YOUR HEURISTIC IS WRONG!

> Maybe I should have diguised this bit as:
> 
> if (dead_swap_page)
>   do_writepage_first_pass = 1;

So tell me: what does the above help?

I repeat: your "dead_swap_page" variable is a random number with
absolutely no meaning. ANYTHING that uses it is buggy. It doesn't help in
the least if you use the first random state to set another random
state: the amount of randomness does not increase or decrease.

See?

> To divert people's brains to what the intent was :-)

I can see the intent.

I can also see that the code doesn't match up to the intent.

I call that a bug. You don't. Fine.

But that code isn't coming anywhere _close_ to my tree until the two
match. And I stand by my assertion that it should be reverted from Alans
tree too.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Linus Torvalds


On Mon, 7 May 2001, Marcelo Tosatti wrote:
> 
> So lets fix it and make it look for the swap counts. 

Ehh.

Which you MUST NOT do without holding the page lock.

Hint: it needs "page->index", and without holding the page lock you don't
know what it could be. An out-of-bounds page index could do anything,
including oopsing the kernel. It so happens that right now you'll only get
a printk from swap_count(), because of defensive programming, but the fact
remains that you _cannot_ decide on whether a page is a dead swap-cache
entry until you've gotten the page lock.

> My point is that its _ok_ for us to check if the page is a dead swap cache
> page _without_ the lock since writepage() will recheck again with the page
> _locked_. Quoting you two messages back: 

Yes. But MY point is that the patch is buggy, and should be reverted. 

Move the swap_count check into the page lock. It's such a heavy operation
that you should, anyway. 

My message is true: you can do some "early out" optimizations. The code
has always done that. But if you look at the old code, it never did any
swap-count calculations or anything like that: it only looked at simple
bits in the "struct page" structure and the page count.

But once the level of complexity reaches actually doing function calls
etc, you should not call them early-out optimizations any more. Especially
not on data that could be stale, and that is used to dereference other
data structures (ie the swap count maps).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Marcelo Tosatti writes:
 > My point is that its _ok_ for us to check if the page is a dead swap cache
 > page _without_ the lock since writepage() will recheck again with the page
 > _locked_. Quoting you two messages back: 
 > 
 > "But it is important to re-calculate the deadness after getting the lock.
 > Before, it was just an informed guess. After the lock, it is knowledge."
 > 
 > See ? 

In fact my patch isn't changing writepage behavior wrt. that page, it
is changing behavior with respect to laundering policy for that page.

Here, let's talk code a little bit so there are no misunderstandings,
I really want to put this to rest:

+   int dead_swap_page;
+
page = list_entry(page_lru, struct page, lru);
 
+   dead_swap_page =
+   (PageSwapCache(page) &&
+page_count(page) == (1 + !!page->buffers));
+

Calculate dead_swap_page outside of lock.

/* Page is or was in use?  Move it to the active list. */
-   if (PageTestandClearReferenced(page) || page->age > 0 ||
-   (!page->buffers && page_count(page) > 1) ||
-   page_ramdisk(page)) {
+   if (!dead_swap_page &&
+   (PageTestandClearReferenced(page) || page->age > 0 ||
+(!page->buffers && page_count(page) > 1) ||
+page_ramdisk(page))) {
del_page_from_inactive_dirty_list(page);
add_page_to_active_list(page);
continue;

If dead_swap_page, ignore referenced bit heuristics.

-   /* First time through? Move it to the back of the list */
-   if (!launder_loop) {
+   /* First time through? Move it to the back of the list,
+* but not if it is a dead swap page. We want to reap
+* those as fast as possible.
+*/
+   if (!launder_loop && !dead_swap_page) {
list_del(page_lru);
list_add(page_lru, _dirty_list);
UnlockPage(page);

If dead_swap_page, ignore launder_loop.  Again, another heuristic
test, not a "state correctness" test.  "launder_loop" is not
protecting "state correctness" of what we do to the page.

Really, what does this have to do with swap counts and page counts?

It's a heuristic. In fact it even seems stupid to me to recalculate
dead_swap_page after we get the lock just for the sake of these
heuristics.

Maybe I should have diguised this bit as:

if (dead_swap_page)
do_writepage_first_pass = 1;

To divert people's brains to what the intent was :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Linus Torvalds writes:
 > What do you expect me to do? The patch is buggy. It should be reverted.
 > What's your problem?

I think the problem he has is that you are acting as if the patch
causes corruptions and will end in failures.  This is how you are
coming across, at least.

Really, your problem with the patch seems to be aesthetic in nature
and that the patch is not doing things cleanly at all.  To this I will
not contest, it surely is not the way to fix this in the end.

If my analysis up to this point is correct, the patch should in fact
not be reverted.  Many patches in Alan's tree are not done in the most
aesthetically pleasing way to you, we all know this. But they do solve
problems for people.  Half of Alan's tree should be reverted if we
followed this rule, really.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Marcelo Tosatti



On Mon, 7 May 2001, Linus Torvalds wrote:

> 
> On Mon, 7 May 2001, Marcelo Tosatti wrote:
> > 
> > So the "dead_swap_page" logic is _not_ buggy and you are full of shit when
> > telling Alan to revert the change. (sorry, I could not avoid this one)
> 
> Well, the problem is that the patch _is_ buggy. 
> 
> swap_writepage() does it right. And dead_swap_page does it wrong. It
> doesn't look at the swap counts, for one thing.

So lets fix it and make it look for the swap counts. 

> The patch should be reverted. The fact that other parts of the system do
> it _right_ is not an argument for mm/vmscan.c to do it wrong.

My point is that its _ok_ for us to check if the page is a dead swap cache
page _without_ the lock since writepage() will recheck again with the page
_locked_. Quoting you two messages back: 

"But it is important to re-calculate the deadness after getting the lock.
Before, it was just an informed guess. After the lock, it is knowledge."

See ? 





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Linus Torvalds


On Mon, 7 May 2001, Marcelo Tosatti wrote:
> 
> So the "dead_swap_page" logic is _not_ buggy and you are full of shit when
> telling Alan to revert the change. (sorry, I could not avoid this one)

Well, the problem is that the patch _is_ buggy. 

swap_writepage() does it right. And dead_swap_page does it wrong. It
doesn't look at the swap counts, for one thing.

The patch should be reverted. The fact that other parts of the system do
it _right_ is not an argument for mm/vmscan.c to do it wrong.

What do you expect me to do? The patch is buggy. It should be reverted.
What's your problem?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Linus Torvalds writes:
 > 
 > On Mon, 7 May 2001, Marcelo Tosatti wrote:
 > > And thats what swap_writepage() is doing:
 > 
 > Ehh.. swap_writepage() is called with the page locked. So it _can_ depend
 > on it.
 > 
 > If the page isn't locked there, then THAT is a bug. A major one.

Linus, he's trying to point out that writepage() will recheck (under
lock) what you think my dead_swap_page thing is not checking :-)

My changes always call writepage() and always redo all the checks
under the proper locks.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Marcelo Tosatti



On Mon, 7 May 2001, Linus Torvalds wrote:

> 
> On Mon, 7 May 2001, Marcelo Tosatti wrote:
> > 
> > On 7 May 2001, Linus Torvalds wrote:
> > 
> > > But it is important to re-calculate the deadness after getting the
> > > lock. Before, it was just an informed guess. After the lock, it is
> > > knowledge. And you can use informed guesses for heuristics, but you
> > > must _not_ use them for any serious decisions.
> > 
> > And thats what swap_writepage() is doing:
> 
> Ehh.. swap_writepage() is called with the page locked. So it _can_ depend
> on it.

So the "dead_swap_page" logic is _not_ buggy and you are full of shit when
telling Alan to revert the change. (sorry, I could not avoid this one)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Linus Torvalds


On Mon, 7 May 2001, Marcelo Tosatti wrote:
> 
> On 7 May 2001, Linus Torvalds wrote:
> 
> > But it is important to re-calculate the deadness after getting the
> > lock. Before, it was just an informed guess. After the lock, it is
> > knowledge. And you can use informed guesses for heuristics, but you
> > must _not_ use them for any serious decisions.
> 
> And thats what swap_writepage() is doing:

Ehh.. swap_writepage() is called with the page locked. So it _can_ depend
on it.

If the page isn't locked there, then THAT is a bug. A major one.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Marcelo Tosatti



On 7 May 2001, Linus Torvalds wrote:

> But it is important to re-calculate the deadness after getting the
> lock. Before, it was just an informed guess. After the lock, it is
> knowledge. And you can use informed guesses for heuristics, but you
> must _not_ use them for any serious decisions.

And thats what swap_writepage() is doing:

static int swap_writepage(struct page *page)
{
/* One for the page cache, one for this user, one for page->buffers */
if (page_count(page) > 2 + !!page->buffers)
goto in_use;
if (swap_count(page) > 1)
goto in_use;

...
}



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Linus Torvalds writes:
 > The whole "dead_swap_page" optimization in the -ac tree is apparentrly
 > completely bogus.  It caches a value that is not valid: you cannot
 > reliably look at whether the page has buffers etc without holding the
 > page locked. 

It caches a value controlling heuristics, not "state".  Specifically
it controls whether we:

1) Ignore the referenced bit, this is fine.

2) Allow writepage() operations in the first pass.  This is fine too.

All normal checks are redone, only heuristics are changed.

Please show me how this is illegal.  Everyone comes to this conclusion
when the first read the code, that I am doing something illegal, then
when I explain what that dead_swap_page thing is doing and they read
it a second time (how shocking! :-) they go "oh, I see".

If the patch is causing problems, it is due to some other bug not my
patch itself.

I do not argue that my patch is "the" way to solve the dead swap page
problem, to the contrary.  Stephen has something which seems to try to
attack this issue in a much nicer way.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread J . A . Magallon


On 05.07 Helge Hafting wrote:
> Tobias Ringstrom wrote:
> > 
> > On Sun, 6 May 2001, David S. Miller wrote:
> > > It is the most straightforward way to make a '1' or '0'
> > > integer from the NULL state of a pointer.
> > 
> > But is it really specified in the C "standards" to be exctly zero or one,
> > and not zero and non-zero?
> 
> !0 is 1.  !(anything else) is 0.  It is zero and one, not
> zero and "non-zero".  So a !! construction gives zero if you have
> zero, and one if you had anything else.  There's no doubt about it.
> >

Isn't this asking for trouble with the optimizer ? It could kill both
!!. Using that is like trusting on a certain struct padding-alignment.

-- 
J.A. Magallon   #  Let the source be with you...
mailto:[EMAIL PROTECTED]
Linux Mandrake release 8.1 (Cooker) for i586
Linux werewolf 2.4.4-ac5 #1 SMP Sat May 5 01:17:07 CEST 2001 i686

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
BERECZ Szabolcs  <[EMAIL PROTECTED]> wrote:
>
>there is a bug in page_launder introduced with kernel 2.4.3-ac12.

Yes.

The whole "dead_swap_page" optimization in the -ac tree is apparentrly
completely bogus.  It caches a value that is not valid: you cannot
reliably look at whether the page has buffers etc without holding the
page locked. 

So calculating "dead_swap_page" without locking the page first is a sure
way to cause trouble.

I can see why the bug was introduced: standard kernels _optimistically_
test whether the condition might be true before they lock the page, and
decide to not even try to touch pages that look like they are probably
going to be considered active.

But it is important to re-calculate the deadness after getting the lock.
Before, it was just an informed guess. After the lock, it is knowledge.
And you can use informed guesses for heuristics, but you must _not_ use
them for any serious decisions.

Alan, please revert.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Horst von Brand

"David S. Miller" <[EMAIL PROTECTED]> said:
> Jonathan Morton writes:
>  > >-  page_count(page) == (1 + !!page->buffers));
>  > 
>  > Two inversions in a row?
> 
> It is the most straightforward way to make a '1' or '0'
> integer from the NULL state of a pointer.

IMVHO, it is clearer to write:

  page_count(page) == 1 + (page->buffers != NULL)

At least, the original poster wouldn't have wondered, and I wouldn't have
had to think a bit to find out what it meant... If gcc generates worse code
for this, it should be fixed.
-- 
Dr. Horst H. von Brand   mailto:[EMAIL PROTECTED]
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria  +56 32 654239
Casilla 110-V, Valparaiso, ChileFax:  +56 32 797513
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread H. Peter Anvin

Followup to:  <[EMAIL PROTECTED]>
By author:Tobias Ringstrom <[EMAIL PROTECTED]>
In newsgroup: linux.dev.kernel
>
> On Sun, 6 May 2001, David S. Miller wrote:
> > It is the most straightforward way to make a '1' or '0'
> > integer from the NULL state of a pointer.
> 
> But is it really specified in the C "standards" to be exctly zero or one,
> and not zero and non-zero?
> 

Yes it is.

> 
> IMHO, the ?: construct is way more readable and reliable.
> 

In C99 one can write (bool)foo which is more readable than either,
especially since that is really what one is trying to do.

-hpa
-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Daniel Phillips

On Monday 07 May 2001 08:26, Tobias Ringstrom wrote:
> On Sun, 6 May 2001, David S. Miller wrote:
> > It is the most straightforward way to make a '1' or '0'
> > integer from the NULL state of a pointer.
>
> But is it really specified in the C "standards" to be exctly zero or
> one, and not zero and non-zero?

Yes, and if we did not have this stupid situation where the C language 
standard is not freely available online then you would not have had to 
ask.

> IMHO, the ?: construct is way more readable and reliable.

There is no difference in reliability.  Readability is a matter of 
opinion - my opinion is that they are equally readable.  To its credit, 
gcc produces the same ia32 code in either case:

int foo = 999;
return 1 + !!foo;

:   movl   $0x3e7,0xfffc(%ebp)
:  cmpl   $0x0,0xfffc(%ebp)
:  je 0x80483e0 
:  mov$0x2,%eax
:  jmp0x80483e5 
:  lea0x0(%esi),%esi
:  mov$0x1,%eax
:  mov%eax,%eax

int foo = 999;
return foo? 2: 1;

:   movl   $0x3e7,0xfffc(%ebp)
:  cmpl   $0x0,0xfffc(%ebp)
:  je 0x80483e0 
:  mov$0x2,%eax
:  jmp0x80483e5 
:  lea0x0(%esi),%esi
:  mov$0x1,%eax
:  mov%eax,%eax

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Alan Cox

> > It is the most straightforward way to make a '1' or '0'
> > integer from the NULL state of a pointer.
> 
> But is it really specified in the C "standards" to be exctly zero or one,
> and not zero and non-zero?

Yes. (Fortunately since when this argument occurred Linus said he would eat
his underpants if he was wrong)

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Helge Hafting

Tobias Ringstrom wrote:
> 
> On Sun, 6 May 2001, David S. Miller wrote:
> > It is the most straightforward way to make a '1' or '0'
> > integer from the NULL state of a pointer.
> 
> But is it really specified in the C "standards" to be exctly zero or one,
> and not zero and non-zero?

!0 is 1.  !(anything else) is 0.  It is zero and one, not
zero and "non-zero".  So a !! construction gives zero if you have
zero, and one if you had anything else.  There's no doubt about it.
> 
> IMHO, the ?: construct is way more readable and reliable.

Thats your opinion.  There are many others.  Some don't like the
?: at all, for example.  And some like all valid C.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Tobias Ringstrom writes:
 > But is it really specified in the C "standards" to be exctly zero or one,
 > and not zero and non-zero?

I'm pretty sure it does.

 > IMHO, the ?: construct is way more readable and reliable.

Well identical code has been there for several months just a few lines
away.

I've seen this idiom used in many places (even the GCC sources :-),
so I'm rather surprised people are seeing it for the first time.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Tobias Ringstrom

On Sun, 6 May 2001, David S. Miller wrote:
> It is the most straightforward way to make a '1' or '0'
> integer from the NULL state of a pointer.

But is it really specified in the C "standards" to be exctly zero or one,
and not zero and non-zero?

IMHO, the ?: construct is way more readable and reliable.

/Tobias

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Marcelo Tosatti writes:
   Hmmm, can't this happen without my patch?
  
  No. We will never call writepage() without __GFP_IO without your patch.
  

I see, because launder_loop never progresses to 1 in that case.

My patch is crap and can cause corruptions, there is not argument
about it now :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Linus Torvalds writes:
  YOUR HEURISTIC IS WRONG!

Please start the conversation this way next time.

  I call that a bug. You don't. Fine.

You made it sound like a data corrupter, a kernel crasher, and that
any bug against a kernel with that patch indicates my patch caused it.
There is an important distinction between this is doing something
silly and this will scramble your disk and crash the kernel.

The latter is the conclusion several people came to.

And I wanted a clarification on this, nothing more.

I wanted this clarification from you _BECAUSE_ the original posting in
this thread saw data corruption which went away after reverting my
patch.  But there is no possible connection between my patch and the
crashes he saw.

Many people have already concluded that Linus says davem's patch is
crap so it must have been causing the corruptions of the original
reporter.

Like you and I, most people are lazy.  And a lazy person is likely
to treat bug goes away with reverting patch plus Linus says the
patch is crap as get rid of the patch, this'll fix the bug.  And
that'll be the end of it, nobody will investigate the true cause of
the problem until it shows up again later for some different reason.

  But that code isn't coming anywhere _close_ to my tree until the two
  match. And I stand by my assertion that it should be reverted from Alans
  tree too.

I have no intent to ever submit that patch to your tree, I know it's
not the right way to solve this problem.

In fact I submitted that patch to you as an [RFC] a long time ago
and it fell on deaf ears.  Or perhaps you happened to have laser eye
surgery that particular day, who knows. :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Linus Torvalds writes:
  What do you expect me to do? The patch is buggy. It should be reverted.
  What's your problem?

I think the problem he has is that you are acting as if the patch
causes corruptions and will end in failures.  This is how you are
coming across, at least.

Really, your problem with the patch seems to be aesthetic in nature
and that the patch is not doing things cleanly at all.  To this I will
not contest, it surely is not the way to fix this in the end.

If my analysis up to this point is correct, the patch should in fact
not be reverted.  Many patches in Alan's tree are not done in the most
aesthetically pleasing way to you, we all know this. But they do solve
problems for people.  Half of Alan's tree should be reverted if we
followed this rule, really.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Marcelo Tosatti writes:
  My point is that its _ok_ for us to check if the page is a dead swap cache
  page _without_ the lock since writepage() will recheck again with the page
  _locked_. Quoting you two messages back: 
  
  But it is important to re-calculate the deadness after getting the lock.
  Before, it was just an informed guess. After the lock, it is knowledge.
  
  See ? 

In fact my patch isn't changing writepage behavior wrt. that page, it
is changing behavior with respect to laundering policy for that page.

Here, let's talk code a little bit so there are no misunderstandings,
I really want to put this to rest:

+   int dead_swap_page;
+
page = list_entry(page_lru, struct page, lru);
 
+   dead_swap_page =
+   (PageSwapCache(page) 
+page_count(page) == (1 + !!page-buffers));
+

Calculate dead_swap_page outside of lock.

/* Page is or was in use?  Move it to the active list. */
-   if (PageTestandClearReferenced(page) || page-age  0 ||
-   (!page-buffers  page_count(page)  1) ||
-   page_ramdisk(page)) {
+   if (!dead_swap_page 
+   (PageTestandClearReferenced(page) || page-age  0 ||
+(!page-buffers  page_count(page)  1) ||
+page_ramdisk(page))) {
del_page_from_inactive_dirty_list(page);
add_page_to_active_list(page);
continue;

If dead_swap_page, ignore referenced bit heuristics.

-   /* First time through? Move it to the back of the list */
-   if (!launder_loop) {
+   /* First time through? Move it to the back of the list,
+* but not if it is a dead swap page. We want to reap
+* those as fast as possible.
+*/
+   if (!launder_loop  !dead_swap_page) {
list_del(page_lru);
list_add(page_lru, inactive_dirty_list);
UnlockPage(page);

If dead_swap_page, ignore launder_loop.  Again, another heuristic
test, not a state correctness test.  launder_loop is not
protecting state correctness of what we do to the page.

Really, what does this have to do with swap counts and page counts?

It's a heuristic. In fact it even seems stupid to me to recalculate
dead_swap_page after we get the lock just for the sake of these
heuristics.

Maybe I should have diguised this bit as:

if (dead_swap_page)
do_writepage_first_pass = 1;

To divert people's brains to what the intent was :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Daniel Phillips

On Monday 07 May 2001 08:26, Tobias Ringstrom wrote:
 On Sun, 6 May 2001, David S. Miller wrote:
  It is the most straightforward way to make a '1' or '0'
  integer from the NULL state of a pointer.

 But is it really specified in the C standards to be exctly zero or
 one, and not zero and non-zero?

Yes, and if we did not have this stupid situation where the C language 
standard is not freely available online then you would not have had to 
ask./rant

 IMHO, the ?: construct is way more readable and reliable.

There is no difference in reliability.  Readability is a matter of 
opinion - my opinion is that they are equally readable.  To its credit, 
gcc produces the same ia32 code in either case:

int foo = 999;
return 1 + !!foo;

main+6:   movl   $0x3e7,0xfffc(%ebp)
main+13:  cmpl   $0x0,0xfffc(%ebp)
main+17:  je 0x80483e0 main+32
main+19:  mov$0x2,%eax
main+24:  jmp0x80483e5 main+37
main+26:  lea0x0(%esi),%esi
main+32:  mov$0x1,%eax
main+37:  mov%eax,%eax

int foo = 999;
return foo? 2: 1;

main+6:   movl   $0x3e7,0xfffc(%ebp)
main+13:  cmpl   $0x0,0xfffc(%ebp)
main+17:  je 0x80483e0 main+32
main+19:  mov$0x2,%eax
main+24:  jmp0x80483e5 main+37
main+26:  lea0x0(%esi),%esi
main+32:  mov$0x1,%eax
main+37:  mov%eax,%eax

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Linus Torvalds writes:
  
  On Mon, 7 May 2001, Marcelo Tosatti wrote:
   And thats what swap_writepage() is doing:
  
  Ehh.. swap_writepage() is called with the page locked. So it _can_ depend
  on it.
  
  If the page isn't locked there, then THAT is a bug. A major one.

Linus, he's trying to point out that writepage() will recheck (under
lock) what you think my dead_swap_page thing is not checking :-)

My changes always call writepage() and always redo all the checks
under the proper locks.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Marcelo Tosatti writes:
  I just thought about this case:

  We find a dead swap cache page, so dead_swap_page goes to 1.
  
  We call swap_writepage(), but in the meantime the swapin readahead code   
  got a reference on the swap map for the page.
  
  We write the page out because (swap_count(page)  1), and we may
  not have __GFP_IO set in the gfp_mask. Boom.

Hmmm, can't this happen without my patch?

Nothing stops people from getting references to the page
between the Page is or was in use? test and the line
which does TryLockPage(page).

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Horst von Brand

David S. Miller [EMAIL PROTECTED] said:
 Jonathan Morton writes:
   -  page_count(page) == (1 + !!page-buffers));
   
   Two inversions in a row?
 
 It is the most straightforward way to make a '1' or '0'
 integer from the NULL state of a pointer.

IMVHO, it is clearer to write:

  page_count(page) == 1 + (page-buffers != NULL)

At least, the original poster wouldn't have wondered, and I wouldn't have
had to think a bit to find out what it meant... If gcc generates worse code
for this, it should be fixed.
-- 
Dr. Horst H. von Brand   mailto:[EMAIL PROTECTED]
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria  +56 32 654239
Casilla 110-V, Valparaiso, ChileFax:  +56 32 797513
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Linus Torvalds


On Mon, 7 May 2001, Marcelo Tosatti wrote:
 
 So what about moving the check for a dead swap cache page from
 swap_writepage() to page_launder() (+ PageSwapCache() check) just before
 the if (!launder_loop) ? 
 
 Yes, its ugly special casing. Any other suggestion ? 

My most favourite approach by far is to just remove the magic for
different writepage's altogether, and just unconditionally do a
writepage. But passing in enough information so that the writepage can
come to the right decision.

So take the old code, and remove the code that does

if (!launder_loop) {
.. move to head ..
continue;
}
writepage(page);

and instead make it do something like

if (writepage(page, launderloop)) {
.. not able to write, move to head ..
continue;
}

where launderloop is passed in to the writepage function as a priority.

Then, a regular writepage() would just do

if (!priority)
return -1;
.. do real writepage here ..
return 0;

while the special casing of dead swap cache entries can be moved into
swap_writepage() (which would do the above _too_, but would first try to
just drop the page if it can tell that the page is dead).

Now, add in the whole accessed recently as part of the priority (ie a
page that had the PG_referenced bit set will always get a priority 0,
even if we could have laundered it, because we don't actually want to
write it out all that badly), and I think you'll get to where David wanted
to get _without_ any special cases.

In fact, it might even clean stuff up. Who knows? At least
page_launder() would not need to know about magic dead swap pages, because
the decision would be entirely in writepage().

And there aren't that many writepage() implementations in the kernel to
fix up (and the fixup tends to be two simple added lines of code for most
of them - just the if (!priority) return).

Also note how some filesystems might use the writepage() callback even
with a zero priority as a hint that things are approaching the point where
we need to start flushing, which might make a difference for deciding when
to try to write a log entry, for example.

Now, I'm not saying this is _the_ solution to it, but I don't see any
really clean alternatives.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Linus Torvalds


On Mon, 7 May 2001, David S. Miller wrote:
 
 Here, let's talk code a little bit so there are no misunderstandings,
 I really want to put this to rest:
 
 Calculate dead_swap_page outside of lock.

NO. That's not what you're doing at all. You're calculating something
completely different that dead swap page. You're calculating do we have
a swap cache entry that is not mapped into any virtual memory?

 If dead_swap_page, ignore referenced bit heuristics.

Which is complete crap. Those reference bits are valid and important
data. You have not computed anything that says otherwise. You have
computed a random number that doesn't tell you anything about whether the
page is dead or not. 

 Really, what does this have to do with swap counts and page counts?
 
 It's a heuristic. In fact it even seems stupid to me to recalculate
 dead_swap_page after we get the lock just for the sake of these
 heuristics.

YOUR HEURISTIC IS WRONG!

 Maybe I should have diguised this bit as:
 
 if (dead_swap_page)
   do_writepage_first_pass = 1;

So tell me: what does the above help?

I repeat: your dead_swap_page variable is a random number with
absolutely no meaning. ANYTHING that uses it is buggy. It doesn't help in
the least if you use the first random state to set another random
state: the amount of randomness does not increase or decrease.

See?

 To divert people's brains to what the intent was :-)

I can see the intent.

I can also see that the code doesn't match up to the intent.

I call that a bug. You don't. Fine.

But that code isn't coming anywhere _close_ to my tree until the two
match. And I stand by my assertion that it should be reverted from Alans
tree too.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Tobias Ringstrom

On Sun, 6 May 2001, David S. Miller wrote:
 It is the most straightforward way to make a '1' or '0'
 integer from the NULL state of a pointer.

But is it really specified in the C standards to be exctly zero or one,
and not zero and non-zero?

IMHO, the ?: construct is way more readable and reliable.

/Tobias

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread David S. Miller


Tobias Ringstrom writes:
  But is it really specified in the C standards to be exctly zero or one,
  and not zero and non-zero?

I'm pretty sure it does.

  IMHO, the ?: construct is way more readable and reliable.

Well identical code has been there for several months just a few lines
away.

I've seen this idiom used in many places (even the GCC sources :-),
so I'm rather surprised people are seeing it for the first time.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Helge Hafting

Tobias Ringstrom wrote:
 
 On Sun, 6 May 2001, David S. Miller wrote:
  It is the most straightforward way to make a '1' or '0'
  integer from the NULL state of a pointer.
 
 But is it really specified in the C standards to be exctly zero or one,
 and not zero and non-zero?

!0 is 1.  !(anything else) is 0.  It is zero and one, not
zero and non-zero.  So a !! construction gives zero if you have
zero, and one if you had anything else.  There's no doubt about it.
 
 IMHO, the ?: construct is way more readable and reliable.

Thats your opinion.  There are many others.  Some don't like the
?: at all, for example.  And some like all valid C.

Helge Hafting
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-07 Thread Alan Cox

  It is the most straightforward way to make a '1' or '0'
  integer from the NULL state of a pointer.
 
 But is it really specified in the C standards to be exctly zero or one,
 and not zero and non-zero?

Yes. (Fortunately since when this argument occurred Linus said he would eat
his underpants if he was wrong)

Alan

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



  1   2   >