Re: [patch 5/5] Optimize page_mkclean_one

2007-07-02 Thread Martin Schwidefsky
On Sun, 2007-07-01 at 15:27 +0200, Peter Zijlstra wrote:
> > But I could easily be overlooking something: Peter will recall.
> 
> /me tries to get his brain up to speed after the OLS closing party :-)

Oh-oh, the Black Thorn party :-)

> I did both pte_dirty and pte_write because I was extra careful. One
> _should_ imply the other, but since we'll be clearing both, I thought it
> prudent to also check both.

Just ran a little experiment: I've added a simple WARN_ON(ret == 0) to
page_mkclean after the page_test_dirty() check to see if there are cases
where the page is dirty and all ptes are read-only. A little stress run
including massive swap did not print a single warning.

> I will have to think on this a little more, but I'm currently of the
> opinion that the optimisation is not correct. But I'll have a thorough
> look at s390 again when I get home.

I think the patch is correct, although I beginning to doubt that is has
any effect.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] Optimize page_mkclean_one

2007-07-01 Thread Martin Schwidefsky
On Sun, 2007-07-01 at 09:54 +0100, Hugh Dickins wrote:
> On Sun, 1 Jul 2007, Martin Schwidefsky wrote:
> > > 
> > > Expect you're right, but I _really_ don't want to comment, when I don't
> > > understand that "|| pte_write" in the first place, and don't know the
> > > consequence of pte_dirty && !pte_write or !pte_dirty && pte_write there.
> > 
> > The pte_write() part is for the shared dirty page tracking. If you want
> > to make sure that a max of x% of your pages are dirty then you cannot
> > allow to have more than x% to be writable. Thats why page_mkclean_one
> > clears the dirty bit and makes the page read-only.
> 
> The whole of page_mkclean_one is for the dirty page tracking: so it's
> obvious why it tests pte_dirty, but not obvious why it tests pte_write.

Yes, the pte_write call is needed for shared dirty page tracking. In
particular its needed for s390 but for corner cases where a page is
writable but not dirty it might be needed for other architectures as
well.

> > > My suspicion is that the "|| pte_write" is precisely to cover your
> > > s390 case where pte is never dirty (it may even have been me who got
> > > Peter to put it in for that reason).  In which case your patch would
> > > be fine - though I think it'd be improved a lot by a comment or
> > > rearrangement or new macro in place of the pte_dirty || pte_write
> > > line (perhaps adjust my pte_maybe_dirty in asm-generic/pgtable.h,
> > > and use that - its former use in msync has gone away now).
> > 
> > No, s390 is covered by the page_test_dirty / page_clear_dirty pair in
> > page_mkclean. 
> 
> That's where its dirty page count comes from, yes: but since the s390
> pte_dirty just says no, if page_mkclean_one tested only pte_dirty,
> then it wouldn't do anything on s390, and in particular wouldn't
> write protect the ptes to re-enforce dirty counting from then on.

Yes, I definitly agree that the pte_write is required for s390.

> So in answering your denials, I grow more confident that the pte_write
> test is precisely for the s390 case.  Though it might also be to cover
> some defect in the write-protection scheme on other arches.

Well, here I'm not so sure. You need the implication
  pte_write() == true -> pte_dirty() == true
to be able to skip the pte_write check for architectures that keep their
dirty bits in the pte. Is this really true for all corner-cases?

> Come to think of it, would your patch really make any difference?
> Although page_mkclean's "count" of dirty ptes on s390 will be nonsense,
> that count would anyway be unknown, and it's only used as a boolean;
> and now I don't think your patch changes the boolean value - if any
> pte is found writable (and if the scheme is working) that implies
> that the page was written to, and so should give the same answer
> as the page_test_dirty.

It depends on code outside of pte_mkclean_one if the patch makes a
difference or not. The additional check for pte_dirty will make the
function less suble as it will not depends on code outside of it.
With the additional check for pte_dirty the function does the following:
1) make the pte clean, 2) make the pte read-only, 3) return true if the
pte has been marked dirty.
Without the check the function does 1), 2) as above and 3) return true
if the pte has been marked dirty or has been writable.
I find it easier to understand the semantics of the function with the
additional check. But that is only me ..

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] Optimize page_mkclean_one

2007-07-01 Thread Peter Zijlstra
On Sun, 2007-07-01 at 09:54 +0100, Hugh Dickins wrote:

> But I could easily be overlooking something: Peter will recall.

/me tries to get his brain up to speed after the OLS closing party :-)

I did both pte_dirty and pte_write because I was extra careful. One
_should_ imply the other, but since we'll be clearing both, I thought it
prudent to also check both.

I will have to think on this a little more, but I'm currently of the
opinion that the optimisation is not correct. But I'll have a thorough
look at s390 again when I get home.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] Optimize page_mkclean_one

2007-07-01 Thread Miklos Szeredi
> page_mkclean_one is used to clear the dirty bit and to set the write
> protect bit of a pte. In additions it returns true if the pte either
> has been dirty or if it has been writable. As far as I can see the
> function should return true only if the pte has been dirty, or page
> writeback will needlessly write a clean page.

There are some weird cases, like for example get_user_pages(), when
the pte takes a write fault and the page is modified, but the pte
doesn't become dirty, because the page is written through the kernel
mapping.

In the get_user_pages() case the page itself is dirtied, so your patch
probably doesn't break that.  But I'm not sure if there aren't similar
cases like that that the pte_write() check is taking care of.

And anyway if the dirty page tracking works correctly, your patch
won't optimize anything, since the pte will _only_ become writable if
the page was dirtied.

So in fact normally pte_dirty() and pte_write() should be equivalent,
except for some weird cases.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] Optimize page_mkclean_one

2007-07-01 Thread Hugh Dickins
On Sun, 1 Jul 2007, Martin Schwidefsky wrote:
> > 
> > Expect you're right, but I _really_ don't want to comment, when I don't
> > understand that "|| pte_write" in the first place, and don't know the
> > consequence of pte_dirty && !pte_write or !pte_dirty && pte_write there.
> 
> The pte_write() part is for the shared dirty page tracking. If you want
> to make sure that a max of x% of your pages are dirty then you cannot
> allow to have more than x% to be writable. Thats why page_mkclean_one
> clears the dirty bit and makes the page read-only.

The whole of page_mkclean_one is for the dirty page tracking: so it's
obvious why it tests pte_dirty, but not obvious why it tests pte_write.

> 
> > My suspicion is that the "|| pte_write" is precisely to cover your
> > s390 case where pte is never dirty (it may even have been me who got
> > Peter to put it in for that reason).  In which case your patch would
> > be fine - though I think it'd be improved a lot by a comment or
> > rearrangement or new macro in place of the pte_dirty || pte_write
> > line (perhaps adjust my pte_maybe_dirty in asm-generic/pgtable.h,
> > and use that - its former use in msync has gone away now).
> 
> No, s390 is covered by the page_test_dirty / page_clear_dirty pair in
> page_mkclean. 

That's where its dirty page count comes from, yes: but since the s390
pte_dirty just says no, if page_mkclean_one tested only pte_dirty,
then it wouldn't do anything on s390, and in particular wouldn't
write protect the ptes to re-enforce dirty counting from then on.

So in answering your denials, I grow more confident that the pte_write
test is precisely for the s390 case.  Though it might also be to cover
some defect in the write-protection scheme on other arches.

Come to think of it, would your patch really make any difference?
Although page_mkclean's "count" of dirty ptes on s390 will be nonsense,
that count would anyway be unknown, and it's only used as a boolean;
and now I don't think your patch changes the boolean value - if any
pte is found writable (and if the scheme is working) that implies
that the page was written to, and so should give the same answer
as the page_test_dirty.

But I could easily be overlooking something: Peter will recall.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] Optimize page_mkclean_one

2007-07-01 Thread Martin Schwidefsky
On Sat, 2007-06-30 at 15:04 +0100, Hugh Dickins wrote:
> > Oh yes, the dirty handling is tricky. I had to fix a really nasty bug
> > with it lately. As for page_mkclean_one the difference is that it
> > doesn't claim a page is dirty if only the write protect bit has not been
> > set. If we manage to lose dirty bits from ptes and have to rely on the
> > write protect bit to take over the job, then we have a different problem
> > altogether, no ?
> 
> [Moving that over from 1/5 discussion].
> 
> Expect you're right, but I _really_ don't want to comment, when I don't
> understand that "|| pte_write" in the first place, and don't know the
> consequence of pte_dirty && !pte_write or !pte_dirty && pte_write there.

The pte_write() part is for the shared dirty page tracking. If you want
to make sure that a max of x% of your pages are dirty then you cannot
allow to have more than x% to be writable. Thats why page_mkclean_one
clears the dirty bit and makes the page read-only.

> My suspicion is that the "|| pte_write" is precisely to cover your
> s390 case where pte is never dirty (it may even have been me who got
> Peter to put it in for that reason).  In which case your patch would
> be fine - though I think it'd be improved a lot by a comment or
> rearrangement or new macro in place of the pte_dirty || pte_write
> line (perhaps adjust my pte_maybe_dirty in asm-generic/pgtable.h,
> and use that - its former use in msync has gone away now).

No, s390 is covered by the page_test_dirty / page_clear_dirty pair in
page_mkclean. 

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] Optimize page_mkclean_one

2007-06-30 Thread Hugh Dickins
On Fri, 29 Jun 2007, Martin Schwidefsky wrote:
> On Fri, 2007-06-29 at 19:56 +0100, Hugh Dickins wrote:
> > I don't dare comment on your page_mkclean_one patch (5/5),
> > that dirty page business has grown too subtle for me.
> 
> Oh yes, the dirty handling is tricky. I had to fix a really nasty bug
> with it lately. As for page_mkclean_one the difference is that it
> doesn't claim a page is dirty if only the write protect bit has not been
> set. If we manage to lose dirty bits from ptes and have to rely on the
> write protect bit to take over the job, then we have a different problem
> altogether, no ?

[Moving that over from 1/5 discussion].

Expect you're right, but I _really_ don't want to comment, when I don't
understand that "|| pte_write" in the first place, and don't know the
consequence of pte_dirty && !pte_write or !pte_dirty && pte_write there.
Peter?

My suspicion is that the "|| pte_write" is precisely to cover your
s390 case where pte is never dirty (it may even have been me who got
Peter to put it in for that reason).  In which case your patch would
be fine - though I think it'd be improved a lot by a comment or
rearrangement or new macro in place of the pte_dirty || pte_write
line (perhaps adjust my pte_maybe_dirty in asm-generic/pgtable.h,
and use that - its former use in msync has gone away now).

Hugh

On Fri, 29 Jun 2007, Martin Schwidefsky wrote:

> page_mkclean_one is used to clear the dirty bit and to set the write
> protect bit of a pte. In additions it returns true if the pte either
> has been dirty or if it has been writable. As far as I can see the
> function should return true only if the pte has been dirty, or page
> writeback will needlessly write a clean page.
> 
> Signed-off-by: Martin Schwidefsky <[EMAIL PROTECTED]>
> ---
> 
>  mm/rmap.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff -urpN linux-2.6/mm/rmap.c linux-2.6-patched/mm/rmap.c
> --- linux-2.6/mm/rmap.c   2007-06-29 09:58:33.0 +0200
> +++ linux-2.6-patched/mm/rmap.c   2007-06-29 15:44:58.0 +0200
> @@ -433,11 +433,12 @@ static int page_mkclean_one(struct page 
>  
>   flush_cache_page(vma, address, pte_pfn(*pte));
>   entry = ptep_clear_flush(vma, address, pte);
> + if (pte_dirty(entry))
> + ret = 1;
>   entry = pte_wrprotect(entry);
>   entry = pte_mkclean(entry);
>   set_pte_at(mm, address, pte, entry);
>   lazy_mmu_prot_update(entry);
> - ret = 1;
>   }
>  
>   pte_unmap_unlock(pte, ptl);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/