Bug#394392: msync() in recent kernels fails LSB

2006-12-19 Thread Martin Michlmayr
* Jeff Licquia [EMAIL PROTECTED] [2006-10-20 19:17]:
 From a recent run of the LSB 3.1 tests:
 
 10|852 /tset/LSB.os/mfiles/msync_P/T.msync_P 22:58:49|TC Start, scenario ref 
 858-0
 
 FSG internal testing showed that Fedora Core 5's 2.6.18 kernel does not
 fail in the same way.  I believe I've traced it to a backported change
 from 2.6.19 development.  The specific commit touching msync() is
 204ec841fbea3e5138168edbc3a76d46747cc987 in git; it relies on several
 commits immediately preceding it.  I've built Linus's tree on amd64, and
 it passes the test.  I have not, however, built a 2.6.18 kernel with
 this patch and tested it, though it's the only patch in the Fedora
 kernel which touches the msync() code.

So it seems that the patches needed for msync() conformance we applied
from 2.6.19 to our 2.6.18 cause filesystem corruption, see the current
discussion on this on lkml.  From what I understand it, plain 2.6.18
is not LSB 3.1 conform and you need some fixes which are associated
with filesystem corruption.  While Andrew, Linus and co are currently
trying to come up with a patch, I think it might be better for us to
simply back out these patches.  What doe it take to get an exception
for this LSB test?  Surely the reasons cited above (fails with 2.6.18,
a fairly current kernel and the patches to fix it are associated with
fs corruption) are pretty good arguments for an exception...
-- 
Martin Michlmayr
http://www.cyrius.com/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#394392: msync() in recent kernels fails LSB

2006-12-19 Thread Jeff Licquia
On Tue, 2006-12-19 at 17:08 +0100, Martin Michlmayr wrote:
 So it seems that the patches needed for msync() conformance we applied
 from 2.6.19 to our 2.6.18 cause filesystem corruption, see the current
 discussion on this on lkml.  From what I understand it, plain 2.6.18
 is not LSB 3.1 conform and you need some fixes which are associated
 with filesystem corruption.  While Andrew, Linus and co are currently
 trying to come up with a patch, I think it might be better for us to
 simply back out these patches.  What doe it take to get an exception
 for this LSB test?  Surely the reasons cited above (fails with 2.6.18,
 a fairly current kernel and the patches to fix it are associated with
 fs corruption) are pretty good arguments for an exception...

I brought this up at our weekly conference call, which generated quite a
lot of discussion.  The argument against issuing a waiver is that this
isn't strictly required; a distro could fix the problem by downgrading
the kernel to 2.6.16.

I've also forwarded your message to Ian Murdock, who is the current
chair of the LSB Steering Committee.

The process for getting an exception is as follows:

 - Release a product with a problem.

 - Run the tests, and fail in some way.

 - Request a waiver from the LSB Specification Authority.  There's a
link for doing so from the certification site.

Of course, the problem is that we have to make a decision now, so we
also have an unofficial process of discussing known issues.  That
process has already been started.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#394392: msync() in recent kernels fails LSB

2006-12-19 Thread maximilian attems
On Tue, Dec 19, 2006 at 05:08:24PM +0100, Martin Michlmayr wrote:
 
 So it seems that the patches needed for msync() conformance we applied
 from 2.6.19 to our 2.6.18 cause filesystem corruption, see the current
 discussion on this on lkml.  From what I understand it, plain 2.6.18
 is not LSB 3.1 conform and you need some fixes which are associated
 with filesystem corruption.  While Andrew, Linus and co are currently
 trying to come up with a patch, I think it might be better for us to
 simply back out these patches.  What doe it take to get an exception
 for this LSB test?  Surely the reasons cited above (fails with 2.6.18,
 a fairly current kernel and the patches to fix it are associated with
 fs corruption) are pretty good arguments for an exception...
 -- 
 Martin Michlmayr
 http://www.cyrius.com/

why not wait for the fix and backport it?!

--
maks


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#394392: msync() in recent kernels fails LSB

2006-12-19 Thread Martin Michlmayr
* maximilian attems [EMAIL PROTECTED] [2006-12-19 20:30]:
 why not wait for the fix and backport it?!

Well, have you seen the discussion on lkml in which people are
basically tapping in the dark?  I hope there'll be a clean fix
in a few days but...
-- 
Martin Michlmayr
http://www.cyrius.com/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#394392: msync() in recent kernels fails LSB

2006-12-19 Thread maximilian attems
On Tue, Dec 19, 2006 at 11:25:22PM +0100, Martin Michlmayr wrote:
 * maximilian attems [EMAIL PROTECTED] [2006-12-19 20:30]:
  why not wait for the fix and backport it?!
 
 Well, have you seen the discussion on lkml in which people are
 basically tapping in the dark?  I hope there'll be a clean fix
 in a few days but...
 -- 
 Martin Michlmayr
 http://www.cyrius.com/

yes there are 2 working hacks around, so the final should come up.
but we shouldn't mould our hands to quickly with the first shot.

-- 
maks
christmas with dj dsl and mieze medusa
- http://www.cabaretrenz.org/programm+M5c858bac7ae.html


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#394392: msync() in recent kernels fails LSB

2006-12-19 Thread Steve Langasek
On Tue, Dec 19, 2006 at 05:08:24PM +0100, Martin Michlmayr wrote:
  10|852 /tset/LSB.os/mfiles/msync_P/T.msync_P 22:58:49|TC Start, scenario 
  ref 858-0

  FSG internal testing showed that Fedora Core 5's 2.6.18 kernel does not
  fail in the same way.  I believe I've traced it to a backported change
  from 2.6.19 development.  The specific commit touching msync() is
  204ec841fbea3e5138168edbc3a76d46747cc987 in git; it relies on several
  commits immediately preceding it.  I've built Linus's tree on amd64, and
  it passes the test.  I have not, however, built a 2.6.18 kernel with
  this patch and tested it, though it's the only patch in the Fedora
  kernel which touches the msync() code.

 So it seems that the patches needed for msync() conformance we applied
 from 2.6.19 to our 2.6.18 cause filesystem corruption, see the current
 discussion on this on lkml.  From what I understand it, plain 2.6.18
 is not LSB 3.1 conform and you need some fixes which are associated
 with filesystem corruption.  While Andrew, Linus and co are currently
 trying to come up with a patch, I think it might be better for us to
 simply back out these patches.  What doe it take to get an exception
 for this LSB test?  Surely the reasons cited above (fails with 2.6.18,
 a fairly current kernel and the patches to fix it are associated with
 fs corruption) are pretty good arguments for an exception...

Reverting this is an ABI change which may be installer-affecting (I don't
know if it is, but unlike most of the other pressing ABI changes this one
would apply to the kernels used by the installer).  If we think the fix may
be available soon, I think we're better off pushing forward rather than
reverting.

If the decision *is* made to revert, release-wise it's best if the kernel
team can bundle up any final ABI changes they want to make for etch at the
same time so that we can get it done with and get d-i RC2 out.

Thanks,
-- 
Steve Langasek   Give me a lever long enough and a Free OS
Debian Developer   to set it on, and I can move the world.
[EMAIL PROTECTED]   http://www.debian.org/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#394392: msync() in recent kernels fails LSB

2006-10-20 Thread Jeff Licquia
Package: linux-image-2.6.17-2-686
Version: 2.6.17-9
Severity: important

From a recent run of the LSB 3.1 tests:

10|852 /tset/LSB.os/mfiles/msync_P/T.msync_P 22:58:49|TC Start, scenario ref 
858-0
15|852 3.6-lite 9|TCM Start
400|852 7 1 22:59:13|IC Start
200|852 7 22:59:13|TP Start
520|852 7 8662 1 1|msync() did not return -1, returned 0
220|852 7 1 22:59:13|FAIL
410|852 7 1 22:59:13|IC End
80|852 0 22:59:15|TC End, scenario ref 858-0

The test mmap()'s three pages from a large file read-write, munmap()'s
the middle page, and then tries to msync() the first two pages, both in
synchronous and asynchronous modes.  Both attempts should fail, because
one of the pages in the range is not mapped.  Starting with kernel
2.6.17, at least one of the msync() calls succeeded.  I've confirmed the
failure happens in 2.6.18 i386 kernels, and on powerpc and amd64 with
2.6.17 kernels.

I've been able to trace the bug to commit
707c21c848deeb0200ba3f07e4ba90e6dc419c2f in git.

FSG internal testing showed that Fedora Core 5's 2.6.18 kernel does not
fail in the same way.  I believe I've traced it to a backported change
from 2.6.19 development.  The specific commit touching msync() is
204ec841fbea3e5138168edbc3a76d46747cc987 in git; it relies on several
commits immediately preceding it.  I've built Linus's tree on amd64, and
it passes the test.  I have not, however, built a 2.6.18 kernel with
this patch and tested it, though it's the only patch in the Fedora
kernel which touches the msync() code.

The patch from the Fedora kernel is attached.  It is fairly high-impact,
though; if a less invasive patch is needed, please let me know.

Marked important because LSB 3.1 compatibility has been identified as
a release goal.

Date: Wed, 19 Jul 2006 00:03:33 +0200
From: Peter Zijlstra [EMAIL PROTECTED]
Subject: Re: [RHEL5][PATCH 1/8] mm: tracking shared dirty pages

Respin against current Rawhide kernel.

The other patches apply with a little offset/fuzz but end up rightly.
It even compiles :-)

Don, is this enough, or would you like me to repost the whole series
(minus 8/8) fuzzless?

---


From: Peter Zijlstra [EMAIL PROTECTED]

Tracking of dirty pages in shared writeable mmap()s.

The idea is simple: write protect clean shared writeable pages, catch the
write-fault, make writeable and set dirty.  On page write-back clean all
the PTE dirty bits and write protect them once again.

The implementation is a tad harder, mainly because the default
backing_dev_info capabilities were too loosely maintained. Hence it is
not enough to test the backing_dev_info for cap_account_dirty.

The current heuristic is as follows, a VMA is eligible when:
 - its shared writeable
(vm_flags  (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
 - it is not a 'special' mapping
(vm_flags  (VM_PFNMAP|VM_INSERTPAGE)) == 0
 - the backing_dev_info is cap_account_dirty
mapping_cap_account_dirty(vma-vm_file-f_mapping)
 - f_op-mmap() didn't change the default page protection

Page from remap_pfn_range() are explicitly excluded because their
COW semantics are already horrid enough (see vm_normal_page() in
do_wp_page()) and because they don't have a backing store anyway.

mprotect() is taught about the new behaviour as well. However it
fudges the last condition.

Cleaning the pages on write-back is done with page_mkclean() a new
rmap call. It cleans and wrprotects all PTEs of dirty accountable
pages.

Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty()
from under -private_lock. This seems to be safe, since -private_lock
is used to serialize access to the buffers, not the page itself.
This is needed because clear_page_dirty() will call into page_mkclean()
and would thereby violate locking order.

Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
Cc: Hugh Dickins [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 fs/buffer.c  |2 -
 include/linux/mm.h   |   34 ++
 include/linux/rmap.h |8 ++
 mm/memory.c  |   29 ++
 mm/mmap.c|   10 +++
 mm/mprotect.c|   21 ++--
 mm/page-writeback.c  |   17 ++---
 mm/rmap.c|   65 +++
 8 files changed, 156 insertions(+), 30 deletions(-)

Index: latest/fs/buffer.c
===
--- latest.orig/fs/buffer.c
+++ latest/fs/buffer.c
@@ -2984,6 +2984,7 @@ int try_to_free_buffers(struct page *pag
 
 	spin_lock(mapping-private_lock);
 	ret = drop_buffers(page, buffers_to_free);
+	spin_unlock(mapping-private_lock);
 	if (ret) {
 		/*
 		 * If the filesystem writes its buffers by hand (eg ext3)
@@ -2995,7 +2996,6 @@ int try_to_free_buffers(struct page *pag
 		 */
 		clear_page_dirty(page);
 	}
-	spin_unlock(mapping-private_lock);
 out:
 	if (buffers_to_free) {
 		struct buffer_head *bh = buffers_to_free;
Index: latest/include/linux/mm.h