subject:"Re\: 2.4.0\-test10\-pre3\:Oops in mm\/filemap.c\:filemap_write

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-24 Thread Alexander Viro

On Tue, 24 Oct 2000, Petr Vandrovec wrote:

> On 23 Oct 00 at 23:05, Alexander Viro wrote:
> 
> > Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what
> > analysis had been done? Petr, can you reproduce the problem on -test7?
> > Unfortunately, clean test would take the backport of ext2 changes
> > (truncate-related, happened around the same time), but IIRC -test7 was
> > relatively stable...
> 
> Just FYI - I was able to reproduce it with 2.4.0-test5 and all laters
> (different pre-s of test6/7/8/9/10) I have. I was not able to reproduce 
> it with 2.2.17-pre13 (although this kernel had really bad behavior when
> I run out of space with backing file - sendmail suspended itself due
> to load of 37... on machine which has usually load 0.01). 
> 
> I do not have kernels between 2.2.17 and 2.4.0-test5 installed.
> 
> If you have anything I should check...

I see the race that can lead to screwups you've seen, but I don't see a
really clear way to close it.

Race in question:
truncate_inode_pages() removes the page from pagecache.
sync_pte is called before we get to killing the pagetable
references.

Potentially, the following might be more or less OK, but I'm not too
happy with it:
* All ->nopage() methods mark page uptodate (upon success, that
is).
* do_no_page() fails if page is not uptodate.
* vmtruncate starts with setting the size and going through the
pagecache, removing all fully truncated pages from the hash chains and
gathering them in the list and removing uptodate flag.
* then it does vmtruncate_list() calls, zapping the pagetable
references to the pages in question.
* *then* it goes through the list and does the rest of
truncate_inode_pages() work.

The theory being:
once the i_size is set, no new pages will be bound to affected
part of file.
when we unhash all affected pages and mark them non-uptodate there
will be no new pagetable references to them (they will not be found, and
any pagefault in progress will fail since we mark them non-uptodate).
any possible IO-in-progress on these pages will not be disturbed -
it doesn't depend on the page being hashed. Aliases do not matter, since
they won't be bound to the relevant part of file anyway.
once we kill all pagetable references (and no new references are
going to appear at that point) we can be sure that no process will
initiate _any_ operations on these pages. Notice that potential dirty data
on them doesn't matter - it's going to die anyway (file is being
truncated).
*now* we grab the page lock on each of these pages, thus letting
any method-in-progress to be completed and that point we can safely kill
->mapping.

Linus, could you comment on that variant?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-24 Thread Petr Vandrovec

On 23 Oct 00 at 23:05, Alexander Viro wrote:

> Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what
> analysis had been done? Petr, can you reproduce the problem on -test7?
> Unfortunately, clean test would take the backport of ext2 changes
> (truncate-related, happened around the same time), but IIRC -test7 was
> relatively stable...

Just FYI - I was able to reproduce it with 2.4.0-test5 and all laters
(different pre-s of test6/7/8/9/10) I have. I was not able to reproduce 
it with 2.2.17-pre13 (although this kernel had really bad behavior when
I run out of space with backing file - sendmail suspended itself due
to load of 37... on machine which has usually load 0.01). 

I do not have kernels between 2.2.17 and 2.4.0-test5 installed.

If you have anything I should check...
Thanks,
Petr Vandrovec
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro




On Mon, 23 Oct 2000, Linus Torvalds wrote:

> Note that if there really are only 9 "nopage" routines, then it is a lot
> easier to just add the single "SetPageUptodate(page)" into those 9
> routines, and thus let the VM know of the race.

Works for me. And yes, the list of ->nopage instances that can return
success is that short:
drm_vm_shm_nopage
drm_vm_shm_nopage_lock
drm_vm_dma_nopage
sgi_graphics_nopage
via_mm_nopage
ncp_file_mmap_nopage
shm_nopage
shmzero_nopage
filemap_nopage
- sorry, it's not 9+filemap_nopage, it's 8+filemap_nopage.

However, it still leaves a window for the race: we invalidate first and
remove from pagetables later. And ClearPageUptodate() is obviously
truncate_inode_pages() work. Grr...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds

On Tue, 24 Oct 2000, Alexander Viro wrote:
> 
> It's not the only problem, but I would feel _much_ safer if pagefault
> wouldn't rely on pagecache miss. Actually... Hey. Why don't we do the
> insertion into page tables _within_ ->nopage()?

NO!

We used to do this a LOONG time ago.

Distributing the VM information on how to properly insert a pte into the
page table into drivers is _NOT_ something I want to do again. It was a
pain to get it properly modularized, we're not going back to the horror.

Note that if there really are only 9 "nopage" routines, then it is a lot
easier to just add the single "SetPageUptodate(page)" into those 9
routines, and thus let the VM know of the race.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro

On Mon, 23 Oct 2000, Linus Torvalds wrote:

> 
> 
> On Mon, 23 Oct 2000, Alexander Viro wrote:
> > 
> > Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what
> > analysis had been done? Petr, can you reproduce the problem on -test7?
> 
> I don't think that is it - that code looks very straightforward (and is
> needed on some silly architectures that cannot easily otherwise see if
> they need to be coherent wrt user space - mainly sparc and virtual
> caches).

OK, I see where the race can happen. Yes, vmtruncate() tries to kill the
mappings. Right. However, it does that _after_ truncate_inode_pages().
And there is a window when the only lock we are holding is ->i_sem.
sync_pte in that window ==> we are fucked.

So the question being: WTF do we postpone zapping the page tables until
after the truncate_inode_pages()? The following rules might make life
simpler, AFAICS:
* as soon as ->i_size is set, no new pagetable references to
off-limits pages can appear.
* as soon as we are don with vmtruncate_list() there is no
pagetable references.
* truncate_inode_pages() never has to deal with pages refered from
pagetables.
* ->i_size can't increase until we return from vmtruncate().

It's not the only problem, but I would feel _much_ safer if pagefault
wouldn't rely on pagecache miss. Actually... Hey. Why don't we do the
insertion into page tables _within_ ->nopage()? Look: let's take the tail
of do_no_page() into helper function and just call it from the end of
every bloody ->nopage() out there. It _is_ easy: we have only 9 instances
in the tree not counting filemap_nopage(). Moreover,
do_anonymous_page() will become symmetrical to the rest of the crowd.
Comments?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds

On Mon, 23 Oct 2000, Alexander Viro wrote:
> 
> Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what
> analysis had been done? Petr, can you reproduce the problem on -test7?

I don't think that is it - that code looks very straightforward (and is
needed on some silly architectures that cannot easily otherwise see if
they need to be coherent wrt user space - mainly sparc and virtual
caches).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro




Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what
analysis had been done? Petr, can you reproduce the problem on -test7?
Unfortunately, clean test would take the backport of ext2 changes
(truncate-related, happened around the same time), but IIRC -test7 was
relatively stable...


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro

On Mon, 23 Oct 2000, Linus Torvalds wrote:

> Also, the fact that Petr didn't see anything trigger in nopage() makes me
> nervous again. Even if the problem happened during read-ahead, it should
> have gotten into the address space only through nopage. Maybe there is
> some vma that isn't added to the right inode VM list - so that we end up
> missing part of the vmtruncate() stuff?
> 
> Al, any ideas?

Just one: let's slow down a bit and try to write down the rules. Close to
release or not, let's understand WTF happens in that part of VM before
deciding on the choice of band-aids/fixes. Frankly, right now I don't feel
that area - too many changes during the last month. So I'm going to sit
down and read it through. IMO it's worth the delay - I have a gut feeling
that band-aids are going to cost more.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds

On Mon, 23 Oct 2000, Linus Torvalds wrote:
> 
> I'm starting to suspect that we leave this path as-is, and just fix the
> mapping case (and PageUptodate() can work there). That should also avoid
> the nasties.

..and even that looks like I'd have to do the quick-and-dirty case with
the race still there on SMP. Adding the page Uptodate logic to the VM
layer proper is too painful at this point - every single nopage function
would have to be updated to mark its page up-to-date, as they don't
generally do that currently.

Also, the fact that Petr didn't see anything trigger in nopage() makes me
nervous again. Even if the problem happened during read-ahead, it should
have gotten into the address space only through nopage. Maybe there is
some vma that isn't added to the right inode VM list - so that we end up
missing part of the vmtruncate() stuff?

Al, any ideas?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds

On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> 
> With ClearPageDirty() kernel locked up (but no watchdog, so probably
> some livelock) during bootup after fsck /

Actually, it turns out that even with this issue fixed, there's the more
serious issue that the page _has_ to be removed from the page cache once
we get to the point that we're freeing the whole inode (which also calls
truncate_inode_pages).

Which means that we cannot take the easy way out and say "let's delay the
freeing until everything is ok" - even if a buffer is busy due to some
unfortunate timing (so that we turn the page into anonymous buffers), we'd
better get rid of the page from the page cache.

I'm starting to suspect that we leave this path as-is, and just fix the
mapping case (and PageUptodate() can work there). That should also avoid
the nasties.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds

On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> 
> With ClearPageDirty() kernel locked up (but no watchdog, so probably
> some livelock) during bootup after fsck /.

Yeah, the way the truncate logic works right now truncate_whole_page() has
to remove the page from the inode list - otherwise truncate ends up
looping forever trying to truncate that page ;).

And there was a off-by-one error in my first untested version anyway: the
page_count() should be tested against "2", not "1", as the truncate logic
has elevated the count anyway (probably unnecessarily: once we get the
page lock nobody else can race to remove it from the page cache anyway, so
it's not as if it could go away).

> Should I try ClearPageUptodate() instead?

No, I'll have to fix the truncate logic to allow for this all.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec

On 23 Oct 00 at 14:34, Linus Torvalds wrote:
> On Mon, 23 Oct 2000, Alexander Viro wrote:
> > On Mon, 23 Oct 2000, Linus Torvalds wrote:
> > > 
> > > Nope, that just makes the race window smaller. We should check for i_size
> > > after we've gotten the page table lock and just before actually entering
> > > the page into the page tables. Otherwise we'll still race on SMP (a _very_
> > > hard window to get into, admittedly).
> > 
> > Umm... I would probably remove Uptodate upon truncate() and check _that_
> > in the place you've mentioned.
> 
> Works for me..
> 
> > > ClearPageDirty(page);
> >   ClearPageUptodate(page);
> > 
> > How about that?
> 
> Makes sense. 

With ClearPageDirty() kernel locked up (but no watchdog, so probably
some livelock) during bootup after fsck /. Unfortunately, 'PC' column
shows complete garbage - init,kswapd,kupdate and rcS in 'R', with
'PC'=current for rcS, 0xc1459f28 for init, 0xc146ffa8 for kswapd and
0xc1469fc8 for kupdate (needless to say that my kernel does not have
20MB... whee what's with PC? it now prints esp instead of eip?). Should 
I try ClearPageUptodate() instead?

Petr Vandrovec
[EMAIL PROTECTED]

P.S.: I have to go home. Continuing after 10 hours.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds

On Mon, 23 Oct 2000, Alexander Viro wrote:
> 
> On Mon, 23 Oct 2000, Linus Torvalds wrote:
> > 
> > Nope, that just makes the race window smaller. We should check for i_size
> > after we've gotten the page table lock and just before actually entering
> > the page into the page tables. Otherwise we'll still race on SMP (a _very_
> > hard window to get into, admittedly).
> 
> Umm... I would probably remove Uptodate upon truncate() and check _that_
> in the place you've mentioned.

Works for me..

> > ClearPageDirty(page);
>   ClearPageUptodate(page);
> 
> How about that?

Makes sense. 

Note that I'd actually like to hear from Petr first _without_ any added
code in the nopage() handler - the issue of having a page mapped after
nopage() that we shouldn't have mapped is a separate one, and I'd first
like to hear if the problem really goes away even if the mapping bug is
still there..

And _then_ we fix the fact that we should not allow anybody to have a page
mapped past i_size.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds

On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> 
> Yes. With sleep(60) no oops occur (it takes ~45 secs to exit child). 
> This signals to me: should not vmtruncate_list acquire mm->mmap_sem, 
> if it modifies page tables?

No. 

It should get the page_table lock, but that is sufficient for anybody who
_clears_ page tables (and is pretty much the same case as paging something
out of somebody elses page tables).

>I cannot find anything what prevents doing 
> vmtruncate in one task and filemap_sync in another - neither
> page_table_lock spinlock, nor mmap_sem semaphore...

filemap_sync() does hold the page table lock. 

(Which is certainly not to say that it's necessarily bug-free, but I don't
see any obvious problems off-hand).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro




On Mon, 23 Oct 2000, Linus Torvalds wrote:

> On Mon, 23 Oct 2000, Alexander Viro wrote:
> > 
> > That's fine, but I'm afraid that we'll need a bit more than that. A couple of
> > obvious ones:
> > * filemap_nopage() needs the second check for ->i_size. Upon exit.
> 
> Nope, that just makes the race window smaller. We should check for i_size
> after we've gotten the page table lock and just before actually entering
> the page into the page tables. Otherwise we'll still race on SMP (a _very_
> hard window to get into, admittedly).

Umm... I would probably remove Uptodate upon truncate() and check _that_
in the place you've mentioned.

> So how about this truncate_complete_page() implementation:
> 
>   /*
>* Try to get rid of a page.. Clear it if it fails
>* for some reason. The page must be locked upon calling
>* this function.
>*
>* We remove the page from the page cache _after_ we have
>* destroyed all buffer-cache references to it. Otherwise some
>* other process might think this inode page is not in the
>* page cache and creates a buffer-cache alias to it causing
>* all sorts of fun problems ...
>*/
>   static inline void truncate_complete_page(struct page *page)
>   {
>   /* Try to get rid of buffers */
>   if (page->buffers)
>   block_flushpage(page, 0);
>   
>   spin_lock(&pagecache_lock);
>   spin_lock(&pagemap_lru_lock);
>   
>   if (page_count(page) != 1) {
>   memclear_highpage_flush(page, 0, PAGE_CACHE_SIZE);
>   } else {
>   ClearPageDirty(page);
ClearPageUptodate(page);

How about that?

>   __lru_cache_del(page);
>   __remove_inode_page(page);
>   page_cache_release(page);
>   }
>   spin_unlock(&pagemap_lru_lock);
>   spin_unlock(&pagecache_lock);
>   }
> 
> we should probably special-case the "block_flushpage()" failed case, but
> the above should do reasonable things with it (because page_count() will
> be > 1 due to buffers).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds

On Mon, 23 Oct 2000, Alexander Viro wrote:
> 
> That's fine, but I'm afraid that we'll need a bit more than that. A couple of
> obvious ones:
>   * filemap_nopage() needs the second check for ->i_size. Upon exit.

Nope, that just makes the race window smaller. We should check for i_size
after we've gotten the page table lock and just before actually entering
the page into the page tables. Otherwise we'll still race on SMP (a _very_
hard window to get into, admittedly).

> Moreover, what is (area->vm_mm == current->mm) doing in the existing check?

It's for ptrace. You can do ugly things with ptrace that aren't possible
for the process itself.

>   * truncate() should zero the page out if it doesn't remove it from
> cache.

So how about this truncate_complete_page() implementation:

/*
 * Try to get rid of a page.. Clear it if it fails
 * for some reason. The page must be locked upon calling
 * this function.
 *
 * We remove the page from the page cache _after_ we have
 * destroyed all buffer-cache references to it. Otherwise some
 * other process might think this inode page is not in the
 * page cache and creates a buffer-cache alias to it causing
 * all sorts of fun problems ...
 */
static inline void truncate_complete_page(struct page *page)
{
/* Try to get rid of buffers */
if (page->buffers)
block_flushpage(page, 0);

spin_lock(&pagecache_lock);
spin_lock(&pagemap_lru_lock);

if (page_count(page) != 1) {
memclear_highpage_flush(page, 0, PAGE_CACHE_SIZE);
} else {
ClearPageDirty(page);
__lru_cache_del(page);
__remove_inode_page(page);
page_cache_release(page);
}
spin_unlock(&pagemap_lru_lock);
spin_unlock(&pagecache_lock);
}

we should probably special-case the "block_flushpage()" failed case, but
the above should do reasonable things with it (because page_count() will
be > 1 due to buffers).

The above is obviously completely and utterly untested. Petr? Willing to
give this a go?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec

On 23 Oct 00 at 13:57, Linus Torvalds wrote:
> On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> 
> > First page->mapping == NULL entry in syslog is dated 22:23:58, but
> > couple of entries was lost before (probably I should print only '.' for
> > each such page; this run there was more than 100 such pages)
> > Another question is why SIGCHLD was delivered to parent AFTER ftruncate,
> > but exit was invoked couple of seconds before - maybe it syncs
> > child address space to disk?
> 
> exit() basically does do a msync(MSASYNC), so that could be it. 

Yes. With sleep(60) no oops occur (it takes ~45 secs to exit child). 
This signals to me: should not vmtruncate_list acquire mm->mmap_sem, 
if it modifies page tables? I cannot find anything what prevents doing 
vmtruncate in one task and filemap_sync in another - neither
page_table_lock spinlock, nor mmap_sem semaphore...
Thanks,
Petr Vandrovec
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro


On Mon, 23 Oct 2000, Linus Torvalds wrote:

> On Mon, 23 Oct 2000, Alexander Viro wrote:
> 
> > On Mon, 23 Oct 2000, Linus Torvalds wrote:
> > 
> > > Al, any ideas? I have this feeling that the simplest fix is just to leave
> > > the race open, and make truncate_complete_page() just leave such a "racy"
> > > page in the page cache. It will still race, and the invalid page will
> > > still exist, but the end result should be harmless.
> > 
> > Provided that we clean it - why the hell do we want to take it out of
> > the pagecache? I don't see any fundamental reasons to prohibit pages
> > past the ->i_size being hashed.
> 
> In fact, we used to allow them.
> 
> We do want to remove it from the page cache under normal circumstances: it
> makes for much better MM behaviour (ie free pages that are truly useless).
> But I suspect that just adding a test for "page_count(page) == 1" (ie same
> as for the page cache invalidation) before freeing it should give that
> advantage, along with avoiding the problematic unlikely case...

That's fine, but I'm afraid that we'll need a bit more than that. A couple of
obvious ones:
* filemap_nopage() needs the second check for ->i_size. Upon exit.
Moreover, what is (area->vm_mm == current->mm) doing in the existing check?
* truncate() should zero the page out if it doesn't remove it from
cache.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds

On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> 
> Yes. Bad news. No problem was catched in filemap_nopage, but one
> (of 57000) pages was dirty and had page->mapping == NULL... (maybe
> only one was caused that this was just after bootup, with plenty of memory)
> Maybe I should look at readahead code?

Yes, you're right, read-ahead is in fact even more likely to catch the
race.

> In case of truncate it is going irrevocable away. Accesses after truncate
> should (and sometime give you) SIGBUS...

Yes. But I'd rather have a small race where we under certain strange
circumstances forget to raise a SIGBUS than have a kernel bug that causes
oops and possible filesystem corruption. 

So I'm not just looking for a fix, I'm also looking for a SIMPLE fix, with
perhaps some major surgery later (things like making the inode semaphore a
read-write semaphore and just always synchronizing 100% on i_size - which
would fix the problem, but is not a 2.4.x thing, no way Jose).

> First page->mapping == NULL entry in syslog is dated 22:23:58, but
> couple of entries was lost before (probably I should print only '.' for
> each such page; this run there was more than 100 such pages)
> Another question is why SIGCHLD was delivered to parent AFTER ftruncate,
> but exit was invoked couple of seconds before - maybe it syncs
> child address space to disk?

exit() basically does do a msync(MSASYNC), so that could be it. 

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds

On Mon, 23 Oct 2000, Alexander Viro wrote:

> On Mon, 23 Oct 2000, Linus Torvalds wrote:
> 
> > Al, any ideas? I have this feeling that the simplest fix is just to leave
> > the race open, and make truncate_complete_page() just leave such a "racy"
> > page in the page cache. It will still race, and the invalid page will
> > still exist, but the end result should be harmless.
> 
> Provided that we clean it - why the hell do we want to take it out of
> the pagecache? I don't see any fundamental reasons to prohibit pages
> past the ->i_size being hashed.

In fact, we used to allow them.

We do want to remove it from the page cache under normal circumstances: it
makes for much better MM behaviour (ie free pages that are truly useless).
But I suspect that just adding a test for "page_count(page) == 1" (ie same
as for the page cache invalidation) before freeing it should give that
advantage, along with avoiding the problematic unlikely case...

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec


On 23 Oct 00 at 16:13, Alexander Viro wrote:
> On Mon, 23 Oct 2000, Linus Torvalds wrote:
> 
> > Al, any ideas? I have this feeling that the simplest fix is just to leave
> > the race open, and make truncate_complete_page() just leave such a "racy"
> > page in the page cache. It will still race, and the invalid page will
> > still exist, but the end result should be harmless.
> 
> Provided that we clean it - why the hell do we want to take it out of
> the pagecache? I don't see any fundamental reasons to prohibit pages
> past the ->i_size being hashed. Methods _must_ check for ->i_size,
> but they do it anyway. All race-prevention is based on page locks and
> ->i_sem.
> 
> Yes, filemap_nopage() should check for i_size at the very end and fail if
> the page became off-limits. But that's completely unrelated issue - it's
> mmap semantics, not pagecache one.

Yes. Bad news. No problem was catched in filemap_nopage, but one
(of 57000) pages was dirty and had page->mapping == NULL... (maybe
only one was caused that this was just after bootup, with plenty of memory)
Maybe I should look at readahead code? Although to be clear I do not 
know why. Unless there is bug in logic in test program, it should 
first dirty pages, and AFTER that it should truncate - and unmap and 
exit, without ever touching pages of mapping... My first testcases were 
with this race (and with raw devices), but then I found (by removing 
more and more code) that no race (and no raw devices) are required...
 
> The point being: we should _never_ drop ->mapping unless the page is
> irrevocably going away. We can (and probably should) drop the off-limits
> page as soon as ->count hits zero, but we should not do it before that.

In case of truncate it is going irrevocable away. Accesses after truncate
should (and sometime give you) SIGBUS...

 total   used   free sharedbuffers cached
Mem:255768  42208 213560  0496  18420
-/+ buffers/cache:  23292 232476
Swap:   530136  13200 516936

Strace of another run:

1688  22:23:41.748438 execve("./oopsdemo", ["./oopsdemo"], [/* 18 vars */]) = 0
1688  22:23:41.749058 brk(0)= 0x8049ae8
1688  22:23:41.749399 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40016000
1688  22:23:41.749641 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file 
or directory)
1688  22:23:41.749861 open("/etc/ld.so.cache", O_RDONLY) = 4
1688  22:23:41.750011 fstat(4, {st_mode=S_IFREG|0644, st_size=43818, ...}) = 0
1688  22:23:41.750253 old_mmap(NULL, 43818, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40017000
1688  22:23:41.750408 close(4)  = 0
1688  22:23:41.750538 open("/lib/libc.so.6", O_RDONLY) = 4
1688  22:23:41.750676 fstat(4, {st_mode=S_IFREG|0755, st_size=1057576, ...}) = 0
1688  22:23:41.750878 read(4, 
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\224\314"..., 4096) = 4096
1688  22:23:41.751173 old_mmap(NULL, 1072484, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) 
= 0x40022000
1688  22:23:41.751327 mprotect(0x4011e000, 40292, PROT_NONE) = 0
1688  22:23:41.751441 old_mmap(0x4011e000, 24576, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED, 4, 0xfb000) = 0x4011e000
1688  22:23:41.751633 old_mmap(0x40124000, 15716, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40124000
1688  22:23:41.751809 close(4)  = 0
1688  22:23:41.753291 munmap(0x40017000, 43818) = 0
1688  22:23:41.753554 getpid()  = 1688
1688  22:23:41.753785 fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(4, 1), ...}) = 0
1688  22:23:41.753993 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000
1688  22:23:41.754162 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
1688  22:23:41.754430 write(1, "Go\n", 3) = 3
1688  22:23:41.754727 open("ram0", O_RDWR|O_CREAT, 0600) = 4
1688  22:23:41.754949 unlink("ram0")= 0
1688  22:23:41.755103 ftruncate(4, 234881024) = 0
1688  22:23:41.755228 old_mmap(NULL, 234881024, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 
0) = 0x40128000
1688  22:23:41.755700 pipe([5, 6])  = 0
1688  22:23:41.755847 pipe([7, 8])  = 0
1688  22:23:41.756024 fork()= 1689
1689  22:23:41.756735 close(6 
1688  22:23:41.756798 close(5 
1689  22:23:41.756844 <... close resumed> ) = 0
1688  22:23:41.756894 <... close resumed> ) = 0
1689  22:23:41.756949 close(7 
1688  22:23:41.756997 close(8 
1689  22:23:41.757041 <... close resumed> ) = 0
1688  22:23:41.757089 <... close resumed> ) = 0
1689  22:23:41.757139 close(4 
1688  22:23:41.757188 write(6, "\0", 1 
1689  22:23:41.757250 <... close resumed> ) = 0
1688  22:23:41.757301 <... write resumed> ) = 1
1689  22:23:41.757355 read(5,  
1688  22:23:41.757405 read(7,  
1689  22:23:41.757450 <... read resumed> "\0", 1) = 1
1689  22:23:49.260756 write(8, "\0", 1) = 1
1689  22:23:49.436204 read(5,  
1688  22:23:49.442969 <... read resumed> "\

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro

On Mon, 23 Oct 2000, Linus Torvalds wrote:

> Al, any ideas? I have this feeling that the simplest fix is just to leave
> the race open, and make truncate_complete_page() just leave such a "racy"
> page in the page cache. It will still race, and the invalid page will
> still exist, but the end result should be harmless.

Provided that we clean it - why the hell do we want to take it out of
the pagecache? I don't see any fundamental reasons to prohibit pages
past the ->i_size being hashed. Methods _must_ check for ->i_size,
but they do it anyway. All race-prevention is based on page locks and
->i_sem.

Yes, filemap_nopage() should check for i_size at the very end and fail if
the page became off-limits. But that's completely unrelated issue - it's
mmap semantics, not pagecache one.

The point being: we should _never_ drop ->mapping unless the page is
irrevocably going away. We can (and probably should) drop the off-limits
page as soon as ->count hits zero, but we should not do it before that.

Comments?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds

On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> > I'll take a better look at the truncate case (I consider the invalidate
> > case closed). Do you have a simple test-program around?
> 
> Well, I cannot say simple. As I was not able to reproduce it with only
> one task, code below:

Ok, without running this I can already guess at what's up.

One process does the "truncate()", and races with another process that
does a page-in.

The truncate code does

notify_change(ATTR_SIZE);

which will basically cause a "vmtruncate(inode, newsize)".

Just before this happened, the other process gets a page fault, and does a
page_cache_read(). Now, because we are low on memory (this is why it only
shows up when you're swapping), that read will take a while. During that
time, the vmtruncate() starts to execute, and sets inode->i_size to the
new value (that would cause us to no longer accept a page fault - but we
already got past that check in the faulting process). It will then
invalidate all the inode pages > i_size.

Now, the page faulting process comes back, and puts the page into the VM
space. Never mind that it has in the meantime gotten i_mapping = NULL due
to the other process doing the truncate.

Now, if I'm right, you should be able to add something like

if (!old_page->mapping)
printk("mapping went away from under us\n");

to just before the "return old_page()" case in the success path of
filemap_nopage() (mm/filemap.c), and you should see that printk() trigger
when the bug happens.

(There are other users too that are not synchronized wrt inode size
changes and could get an access to a page past the end of the file this
way - I just think that filemap_nopage is probably the one where this is
most easily seen).

The above is obviously not a bug-fix, it's just a validation of the
theory.

Al, any ideas? I have this feeling that the simplest fix is just to leave
the race open, and make truncate_complete_page() just leave such a "racy"
page in the page cache. It will still race, and the invalid page will
still exist, but the end result should be harmless.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec

On 23 Oct 00 at 12:09, Linus Torvalds wrote:
> On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> > 
> > Hi Linus,
> >   unfortunately, this does not explain problem I reported. In my case,
> > underlying fs is ext2, and there is no locking around - only one task
> > does ftruncate() on (big) shareable mapped file (original code does it to
> > prevent dirty pages to be written to disk after their contents is no
> > longer relevant).
> 
> Yes. Yours is a separate problem, and is not due to invalidate_pages() at
> all, but apparently due to truncate_complete_page().
> 
> Basically, all the same old arguments do apply - we cannot just remove the
> page from the page cache if it is mapped.
> 
> Now, the truncate() case is different, because the code _tries_ to also
> zap the actual mappings. The fact that it fails to do so is a bit
> unnerving, actually. 
> 
> I'll take a better look at the truncate case (I consider the invalidate
> case closed). Do you have a simple test-program around?

Well, I cannot say simple. As I was not able to reproduce it with only
one task, code below:
(1) mlocks itself in memory (you can remove it)
(2) creates 224MB file and maps it shareable into memory (224 is
for 256MB machine; I was not able to reproduce it with less than 100MB
file on 256MB machine) (and you must have 224MB of free disk space
in current directory, otherwise...) and unlinks it
(3) fork
(4) child process dirties memory
(5) parent signals child to exit and wait 5 secs
(6) child exits (hopefully...)
(7) parent unmaps data
(8) parent exits
You'll always get page->mapping == NULL dereference... And btw, if
I start code below on my machine (dual PIII/450), after 2 secs of run
whole machine stops - only ping and console switching works - and stays
that way for more than 20 secs. After that program finishes (killing
kernel if no antioops in filemap_write_page present...) and everything
returns to normal (if "-/+ buffers/cache: 28848  226920" is normal ;-) ).

  You can look at "http://platan.vc.cvut.cz/listit" for dump of "struct page"
at time when page->mapping == NULL was catched in filemap_write_page,
or through linux-kernel@ archive searching for 'page->mapping == NULL'
during last month, either from me or from Quintella.
Thanks,
Petr Vandrovec
[EMAIL PROTECTED]

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

unsigned char zero;

#define MSIZE 0x0E00

void x4768(void) {
int fd;
pid_t pid;
int x4778_1[2];
int x4778_2[2];
int from4778;
int to4778;
char* mma[65536];
unsigned int mml[65536];
unsigned int mmi = 0;
unsigned int ln = 0;
unsigned int x;

#define MAL2(a,l) ftruncate(fd, ln+l); mml[mmi] = l; mma[mmi++] = mmap(a, l, 
PROT_READ|PROT_WRITE, MAP_SHARED, fd, ln); ln += l; mma[mmi-1][0] = 99;
#define MAL(l) MAL2(NULL,l)

fd = open("ram0", O_RDWR | O_CREAT, 0600);
unlink("ram0");
MAL(MSIZE);
pipe(x4778_1);
pipe(x4778_2);
pid = fork();
if (!pid) {
int from4768 = x4778_1[0];
int to4768 = x4778_2[1];
close(x4778_1[1]);
close(x4778_2[0]);
close(fd);
read(from4768, &zero, 1);
for (x = 0; x < mml[mmi - 1]; x += 4096)
mma[mmi-1][x] = 98;
write(to4768, &zero, 1);
read(from4768, &zero, 1);
exit(0);

} else if (pid < 0) {
perror("fork failed");
exit(222);
}
to4778 = x4778_1[1];
from4778 = x4778_2[0];
close(x4778_1[0]);
close(x4778_2[1]);

{
write(to4778, &zero, 1);
read(from4778, &zero, 1);

write(to4778, &zero, 1);
sleep(5);
ftruncate(fd, 0);
}
while (mmi--) {
munmap(mma[mmi], mml[mmi]);
}
exit(0);
}

int main(int argc, char* argv[]) {
printf("Go\n");
mlockall(MCL_CURRENT);

x4768();
return 0;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds

On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> 
> Hi Linus,
>   unfortunately, this does not explain problem I reported. In my case,
> underlying fs is ext2, and there is no locking around - only one task
> does ftruncate() on (big) shareable mapped file (original code does it to
> prevent dirty pages to be written to disk after their contents is no
> longer relevant).

Yes. Yours is a separate problem, and is not due to invalidate_pages() at
all, but apparently due to truncate_complete_page().

Basically, all the same old arguments do apply - we cannot just remove the
page from the page cache if it is mapped.

Now, the truncate() case is different, because the code _tries_ to also
zap the actual mappings. The fact that it fails to do so is a bit
unnerving, actually. 

I'll take a better look at the truncate case (I consider the invalidate
case closed). Do you have a simple test-program around?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec


On 23 Oct 00 at 10:33, Linus Torvalds wrote:
> Trond Myklebust  <[EMAIL PROTECTED]> wrote:
> >
> >As for simply settling for a self-consistent mmap() rather than
> >tackling the problem of rereading; the main crime is that you're
> >rendering file locking unusable.
...
> This is not a crime.
> 
...
> And yes, I know that file locking can try to be nice in both of these
> cases.  I know that there are systems that try to revert shared mappings
> etc upon file locking (or not allow file locking at all if mappings are
> present).  Linux is not one of them, and never has been. Neither are
> most UNIXes out there.
> Linus

Hi Linus,
  unfortunately, this does not explain problem I reported. In my case,
underlying fs is ext2, and there is no locking around - only one task
does ftruncate() on (big) shareable mapped file (original code does it to
prevent dirty pages to be written to disk after their contents is no
longer relevant). And for some reason, not all instance of shared mapping 
are invalidated - some are still connected to underyling pages, but 
these pages have page->mapping = NULL... (and no, in testcase there
is no vmware around). So on task exit kernel dies with page->mapping
dereference in filemap_write_page (->filemap_sync->filemap_unmap->exit_mmap->
->mmput->do_exit)...
Thanks,
Petr Vandrovec
[EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Trond Myklebust  <[EMAIL PROTECTED]> wrote:
>
>As for simply settling for a self-consistent mmap() rather than
>tackling the problem of rereading; the main crime is that you're
>rendering file locking unusable.

This is not a crime.

Anybody who uses file locking over NFS is buggy and nobody sane should
expect it to work reliably. There are good reasons why mail agents etc
depend on other kinds of locking over NFS.

Furthermore, anybody who expects file locking to work _anywhere_ (ie
including local filesystems) in the presense of file mapping is just so
out to lunch that it's ridiculous. 

Neither of these issues have anything to do with Linux. They are just
statements of fact.

And yes, I know that file locking can try to be nice in both of these
cases.  I know that there are systems that try to revert shared mappings
etc upon file locking (or not allow file locking at all if mappings are
present).  Linux is not one of them, and never has been. Neither are
most UNIXes out there.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro

On Thu, 19 Oct 2000, Linus Torvalds wrote:

> You'd have to do something like
> 
>   LockPage(page); /* Nobody gets to write to this page (except 
>through mmaps, ugh) */
>   gather_all_mmap_users(page);/* THIS is the nasty one */

Wait a second. invalidate_inode_pages() has no idea of range, right? Finding
all VMAs of shared mappings for given inode is not a big deal. Sure,
repeating it for each bloody page would be painful at extreme, but
"make sure that every access to address within _that_ VMA will result in
pagefault" looks like a reasonable operation. Basically, we have
two sides - pagecache and many VMAs. And loop over VMAs with per-VMA
operations sounds more reasonable than loop over pages. We _can_ block
pageins without messing with pages themselves (it starts with finding VMA,
after all) and we can block new shared mappings (not a big deal).

Comments? Basically, I propose per-VMA rw-semaphore taken on page-in for
read and on that "flush and make sure we'll reread" operation for write +
rw-sem on i_mmap_shared. We could even make invalidation work for less
than whole file, all we need for that is skipping the VMAs out of the
range. Ingo, Linus?

>   nfs_wb_page(page);  /* force write-back on this page */
>   ClearPageUptodate(page);/* mark it not up-to-date to force a read-in 
>next time */
>   UnlockPage(page);   /* Ok, now the client can go wild */
> 
> where everything but the "gather_all_mmap_users()" part is fairly
> straightforward.  The "gather" phase is nasty - it would need to figure
> out every place the page is mapped, make sure those are synchronized (ie
> something like marking the page table entry write-protected and causing a
> TLB invalidate SMP cross-call - at which point the resulting page fault
> and the page lock will catch anybody who tries to write to the page)..

> In no case could you do something like what the current
> invalidate_inode_pages() does, which is to just try to drop the page from
> the cache - that really only works if we're the only user of that page,
> which the "page_count != 1" test now enforces.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust

> " " == Alexander Viro <[EMAIL PROTECTED]> writes:

 > Again, consider the case when two processes share the
 > mapping. Process A has page faulted in. Page is
 > invalidated. Process B tries to access the same page. If you
 > leave it in page tables of A you _MUST_ leave it in
 > cache. Period. Otherwise A and B will have different instances
 > of the page.

Even so, you want to reread it. When I said automatically msync(), I
meant 'schedule a write of what has changed and then do whatever you
need to do to get the page reread'. (You explicitly asked for
semantics, not an implementation)

 > And rereading the thing might be tolerable _only_ if there is
 > another client that had changed the file.  Even if you msync()
 > everything, you have to deal with plain and boring memory
 > modifications done by a process that did that bloody mmap(). If
 > they happen while you are reading the data from server - too
 > fscking bad, you'ld better have a good excuse for destroying
 > the data. write() from another client _is_ a good excuse. But
 > from my reading of fs/nfs/* it looks like we do that (cache
 > invalidation) left, right and center in cases that have nothing
 > to another clients.

That's because under NFS you don't have a cache consistency
protocol. Nothing tells you that the file/directory has changed and
that you have to resync your cache. Instead, you have to infer it from
the fact that some operation has returned file attributes that are
screwy.
In addition you may want to force a reread, because some operation
just changed a directory on the server, and you don't know what else
changed beforehand.

 > IOW, I think that invalidate_inode_pages() is bogus. There is
 > only one situation when we have a right to remove page from
 > pagecache - when it is not mapped anywhere.

The issue is not about removing pages. It's about forcing a reread of
the cached data from the server. Removing the actual pages from the
cache has so far been the only race-free method for doing this (since
pre-2.2.x at least) while ensuring that at least generic 'read',
'readdir' and 'write' work as expected.
Yes it screws up mmap() and should be fixed but without breaking what
little that works please.

As for simply settling for a self-consistent mmap() rather than
tackling the problem of rereading; the main crime is that you're
rendering file locking unusable.
Locking is the case in which you have to issue a guarantee that the
cache is consistent between client and server within the area covered
by the lock. In all other cases you *could* get away with the partial
cache invalidation implementation by arguing that there no consistency
guarantees inherent in the protocol.

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro

On Fri, 20 Oct 2000, Trond Myklebust wrote:

> For the general case of the page cache I think we can keep them quite
> simple:
> 
> + We do in any case want to drop all pages that are unreferenced. (The
> reason for flushing may be that the file size has changed.)
> 
> + For pages that are referenced (and unlocked) we would like to force
> them to get read in anew ASAP. How this is done in practice is
> irrelevant as far as NFS is concerned provided that we don't sleep on
> any I/O while in nfs_zap_caches()/invalidate_inode_pages().
> 
> The lower level stuff can and will sort out the business of flushing
> out pending writebacks that conflict with the read, so that isn't a
> problem for the VFS/VM.
> 
> The problem lies with writes that haven't yet been msync()ed (and
> hence do not have writebacks). For shared mappings, one should perhaps
> schedule an automatic msync() of the dirty pages (???). For private
> mappings, perhaps the best thing would be to defer the read?

Again, consider the case when two processes share the mapping. Process A
has page faulted in. Page is invalidated. Process B tries to access the
same page. If you leave it in page tables of A you _MUST_ leave it in
cache. Period. Otherwise A and B will have different instances of the
page.

It's not about writebacks. If you map something with MAP_SHARED and
fork() afterwards you _MUST_ have the same data at the address returned by
mmap() until one of the processes unmaps the thing.

And rereading the thing might be tolerable _only_ if there is another
client that had changed the file.  Even if you msync() everything, you
have to deal with plain and boring memory modifications done by a process
that did that bloody mmap(). If they happen while you are reading the data
from server - too fscking bad, you'ld better have a good excuse for
destroying the data. write() from another client _is_ a good
excuse. But from my reading of fs/nfs/* it looks like we do that (cache
invalidation) left, right and center in cases that have nothing to another
clients.

IOW, I think that invalidate_inode_pages() is bogus. There is only one
situation when we have a right to remove page from pagecache - when it is
not mapped anywhere.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro

On 20 Oct 2000, Trond Myklebust wrote:

> > " " == Russell King <[EMAIL PROTECTED]> writes:
> 
>  > invalidate_inode_pages nfs_zap_caches nfs_lock fcntl_setlk
>  > do_fcntl sys_fcntl
> 
>  > So I guess that NFS locking is really bad if the region is
>  > mmapped!
> 
> Yep, but that's a symptom, not a cause. We want to be able to run
> invalidate_inode_pages() safely at any moment, since the need can be
> triggered externally (because the server and client page caches
> disagree).

So what exactly do you want it to do when page is mapped by user process?
Should it remain visible or not? What should happen if process writes to
that page?

Trond, I'm not asking about implementation - the question being what
semantics do you want for nfs_zap_caches() wrt user-mapped pages.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro

On 20 Oct 2000, Trond Myklebust wrote:

> Under NFS the problem is that pages can (and *should*) be invalidated
> despite there being pending write backs. The server can trigger the
> need for a cache invalidation at any time.

OK, so what should happen if user does mmap() on NFS file, dirties the
page and server tells that page should be invalidated? Should we still see
that page in process' address space? Silently making it anonymous is the
worst thing possible - it's still accessible through process page tables
and you (a) still see the (allegedly) invalidated data and (b) have no way
in hell to get the cache coherency between processes. I.e. you are in for
situations when we do mmap(), fork(), write data in child and attempt to
read from that address in parent and child yield different
results. Moreover, future attempts to modify the data by either of them
will be invisible in another.

IOW, what do you really want from the invalidation? It looks like the
right thing _might_ be "remove from all page tables", but that can become
really interesting with mlock() and friends.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro

On Fri, 20 Oct 2000, Roger Larsson wrote:

> Is it legal/good practice to unmap the file after closing it?
Yes.

> (Since the sharing needs the fd to mmap it)

It doesn't. Mapping needs struct file * and it doesn't care about
fd. mmap() takes a reference to struct file by fd you've passed and after
that we can forget about descriptors - vma_struct holds a reference to
file and that's it.

> Successful unlinking a file should probably free pages directly to
> free list - might be worth optimizing for.

_Definitely_ no. Hell, unlink() doesn't mean that anything happens with
data - unlink an opened file and if that affects the read/write/lseek
you've found a bug. And mapping is equivalent to having opened descriptors
in that respect.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro




On Fri, 20 Oct 2000, Trond Myklebust wrote:

> > " " == David S Miller <[EMAIL PROTECTED]> writes:
> 
>  > Actually, judging by the trace you provided Russell, I'd say
>  > this is some peculiarity with NFS silly rename handling, and
>  > it'd be best to look for the bug in that code (early inode
>  > reference loss, for example?)
> 
> Russel's trace indicates that the unlink actually has completed and
> has become a negative dentry since the file is labelled '(deleted)'.

No, it doesn't. Proof:

exec 3>/tmp/foo
rm /tmp/foo
ls -l /proc/$$/fd/3

and dentry is _not_ negative.

> That means that the dentry count must have been zero, so that
> dentry_iput() was called.

It doesn't and it wasn't.

> I don't see how dentry_iput() can be called on an open file. In
> principle the dentry count should always be >= 1, so unless there is
> some place where we're calling d_delete() without get()ing the dentry
> first, there should be no path for early inode loss.

d_delete() will not make dentry negative if there are other
users. However, it will make it unhashed in such case, ergo
(deleted) thing.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds

[ Final comment, and then I'll shut up ]

On Thu, 19 Oct 2000, Linus Torvalds wrote:
> 
> You'd have to do something like
> 
>   LockPage(page); /* Nobody gets to write to this page (except 
>through mmaps, ugh) */
>   gather_all_mmap_users(page);/* THIS is the nasty one */
>   nfs_wb_page(page);  /* force write-back on this page */
>   ClearPageUptodate(page);/* mark it not up-to-date to force a read-in 
>next time */
>   UnlockPage(page);   /* Ok, now the client can go wild */

Note that one approach would be to just make invalidate_inode_pages() do
exactly the above.

Now, the gather_all_mmap_users() part is definitely a post-2.4.x thing,
and we can't do nfs_wb_page() in invalidate_inode_page() either, but what
we CAN do is to clear the Uptodate flag (and we do need the page lock to
do so).

The advantage of clearing the uptodate flag (as opposed to doing what we
do now - dropping the page altogether) is that there would be no cache
aliasing issues, and there would be no issues with a page and its
associated data just "disappearing" from under somebody. It would cause
the page to be read in again the next time it is faulted in or somebody
does a read() on it, and that's exactly what we want.

However, we _do_ need to WB the page data some way - but the decision on
whether to invalidate write-backs or finish them would have to be up to
the low-level filesystem.

(We should _not_ allow people to read in the page, and leave some stale
write-back information there, so that we'd write back stuff that we just
read because somebody else noticed that the page was not up-to-date. That
would be just an endless source of (a) confusion and (b) unnecessary
network traffic.)

We could, of course, have some combination of the two: if there is only
one user (us), we can just drop the page, otherwise we can mark it
non-up-to-date to force future readers to re-validate the dang thing.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds

On Thu, 19 Oct 2000, Linus Torvalds wrote:
> 
> I'm saying that we're much much better off guaranteeing local consistency
> over knowingly breaking local consistency over a uncertain global
> consistency issue. Especially as NFS has never guaranteed global
> consistency in the first place, and was not designed to do that.

Btw, if you really worry about NFS v3 and want to add consistency
guarantees to NFS v3, you can probably do that. It's just that you cannot
do it with a simple cache invalidation - you'd need to be much much more
careful (the same way a simple CPU cache invalidate does not guarantee SMP
cache consistency - it only guarantees data corruption due to missed
writes).

I don't know what the best way to do a true cache consistency protocol on
a page leve would be, especially as you will have a _really_ hard time
getting hold of an exclusive pointer to a page with multiple mmap's, but
you could probably get something that guarantees cache consistency as long
as people do _not_ expect mmap to always be 100% consistent.

(The reason you cannot get mmap consistency is just that mmap doesn't have
any kernel synchronization point unless you're willing to shoot down every
single mapping - which is expensive as hell, but doable).

You'd have to do something like

LockPage(page); /* Nobody gets to write to this page (except 
through mmaps, ugh) */
gather_all_mmap_users(page);/* THIS is the nasty one */
nfs_wb_page(page);  /* force write-back on this page */
ClearPageUptodate(page);/* mark it not up-to-date to force a read-in 
next time */
UnlockPage(page);   /* Ok, now the client can go wild */

where everything but the "gather_all_mmap_users()" part is fairly
straightforward.  The "gather" phase is nasty - it would need to figure
out every place the page is mapped, make sure those are synchronized (ie
something like marking the page table entry write-protected and causing a
TLB invalidate SMP cross-call - at which point the resulting page fault
and the page lock will catch anybody who tries to write to the page)..

In no case could you do something like what the current
invalidate_inode_pages() does, which is to just try to drop the page from
the cache - that really only works if we're the only user of that page,
which the "page_count != 1" test now enforces.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust

> " " == Linus Torvalds <[EMAIL PROTECTED]> writes:

 > which is really really bad, because now you have the case that
 > you have 'n' copies of the same page in memory, with 'n' users,
 > out of which 'n-1' users have the wrong page. And those 'n-1'
 > users don't even have any way of _knowing_ that they have the
 > wrong page.

 > Which is why we MUST NOT drop a page that has users. Really.

 > I'm telling you that cases #4 and #5 are _much_ worse than your
 > "solution" to case #2. And you argue that your solution is good
 > only because you're completely ignoring cases #4 and #5.

No. I'm arguing (at 4:40am and while trying to keep one eye on our
detector's data acquisition) on the basis that whoever holds the file
lock has to have a guarantee of obtaining 100% accuracy on the locked
region.

I agree that dropping pages is ugly and that it will always give
problems with shared mmap(), so if it can be shown that clearing
PG_uptodate and rereading the same page will give the required
guarantee on locking, then I'm not going to complain.

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds

On 20 Oct 2000, Trond Myklebust wrote:
> 
> The problem here is that NFS pages have 3 rather than 2 states:
>   1) mmapped & correct.
>   2) mmapped & incorrect. (but possibly dirty)
>   3) Unmapped
> 
> For case 1), we clearly want to have the page in inode->i_mapping.
> For cases 2) & 3) we don't.

I think you're WRONG, WRONG, WRONG.

Your case #2 is not at all a state.

It's "mapped, possibly dirty, and locally correct. But _maybe_ globally
incorrect".

If you remove the page in case #2 from the hashing, you will have case #4,
which you're ignoring:

   4) Totally, and utterly incorrect. Without any way for the application
  to even _know_ that it's incorrect.

Note that case 4 is accompanied by case #5, later on:

   5) two separate pages both mapped, one hashed and up-to-date, the other
  mapped in other processes but incorrect.

which is really really bad, because now you have the case that you have
'n' copies of the same page in memory, with 'n' users, out of which 'n-1'
users have the wrong page. And those 'n-1' users don't even have any way
of _knowing_ that they have the wrong page.

Which is why we MUST NOT drop a page that has users. Really.

I'm telling you that cases #4 and #5 are _much_ worse than your "solution"
to case #2. And you argue that your solution is good only because you're
completely ignoring cases #4 and #5.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds

On 20 Oct 2000, Trond Myklebust wrote:
> 
>  > invalidate_inode_pages nfs_zap_caches nfs_lock fcntl_setlk
>  > do_fcntl sys_fcntl
> 
>  > So I guess that NFS locking is really bad if the region is
>  > mmapped!
> 
> Yep, but that's a symptom, not a cause. We want to be able to run
> invalidate_inode_pages() safely at any moment, since the need can be
> triggered externally (because the server and client page caches
> disagree).

Well, the thing is, that if somebody has a page mapped, there's nothing we
can do if the server and client disagrees. 

We just cannot invalidate the page - people would actually lose data, and
would lose major local consistency guarantees that UNIX filesystems are
supposed to get. 

So it's a matter of one sort of consistency against another - and I think
the local consistency requirements are the stricter ones, considering that
we don't even _know_ whether the server has really changed those pages or
not..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds

On 20 Oct 2000, Trond Myklebust wrote:
> 
> Under NFS the problem is that pages can (and *should*) be invalidated
> despite there being pending write backs. The server can trigger the
> need for a cache invalidation at any time.
> The existence of file locks that aren't page aligned, as well as
> partial page writeback ensure that we cannot make the equivalence
>page has pending write == page is correct.

Note that this is not what my patch really says at all.

My patch says "ok, we should invalidate this page, because somebody thinks
it may be bad. HOWEVER, we cannot reasonably do that, because we have
users of the page (which may be completely unrelated to pending writes),
and invalidating this page is _certain_ to cause corruption".

Basically think of the case of somebody having a shared mmap over NFS. He
has dirty data in his pages, and he expects those to be written out to the
server. There is NO QUESTION about this fact.

Imagine somebody (possibly even the same person) then releasing or getting
a file lock, possibly in some other part of the file. 

Without that added test, all those dirty local pages just got completely
thrown away. 

Now, you have a choice: you can KNOW that you're doing something horribly
and utterly wrong (throwing away data), or you can SUSPECT that you might
cause non-coherency between the client and the server. You cannot get
both.

I'm saying that we're much much better off guaranteeing local consistency
over knowingly breaking local consistency over a uncertain global
consistency issue. Especially as NFS has never guaranteed global
consistency in the first place, and was not designed to do that.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust

> " " == Linus Torvalds <[EMAIL PROTECTED]> writes:

 > The advantage of clearing the uptodate flag (as opposed to
 > doing what we do now - dropping the page altogether) is that
 > there would be no cache aliasing issues, and there would be no
 > issues with a page and its associated data just "disappearing"
 > from under somebody. It would cause the page to be read in
 > again the next time it is faulted in or somebody does a read()
 > on it, and that's exactly what we want.

I've proposed this in the past, but there you refused on the grounds
that it breaks assumptions in the VFS. I'm otherwise happy with this
sort of thing. It ensures both shared mmap() and cache consistency.

The only problem I can think of would be for dirty pages that haven't
yet been msync()ed, however I'm far from being competent to evaluate
any side-effects on the VM.

 > However, we _do_ need to WB the page data some way - but the
 > decision on whether to invalidate write-backs or finish them
 > would have to be up to the low-level filesystem.

This is already in place.

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust

> " " == Linus Torvalds <[EMAIL PROTECTED]> writes:

 > Btw, that "invalidate_inode_pages()" thing is just wrong - we
 > can't just remove pages that are mapped etc, because that would
 > result in no end of fun aliasing problems etc.

 > How about adding a test in invalidate_inode_pages() like

 >  /* We cannot invalidate a locked page */ if
 >  (TryLockPage(page))
 >  continue;

 > + /* We cannot invalidate a page that is in use */
 > + if (page_count(page) != 1) {
 > + UnlockPage(page);
 > + continue;
 > + }
 > +
 >  __lru_cache_del(page); __remove_inode_page(page);

The problem here is that NFS pages have 3 rather than 2 states:
  1) mmapped & correct.
  2) mmapped & incorrect. (but possibly dirty)
  3) Unmapped

For case 1), we clearly want to have the page in inode->i_mapping.
For cases 2) & 3) we don't.

However for case 2) we still have a weak association to the inode
itself, and we want to be able to reference inode metadata etc.  Would
it make sense then to remove these pages from i_mapping, but to hang
them onto a new struct address_space (call it i_unmapped for want of a
better name)?

That would allow you to keep a consistent state for the page, while
still allowing you to 'invalidate' it (by removing it from the
i_mapping) and hence maintain a consistent cache.

invalidate_inode_pages() would then reduce to

   remove_page_from_inode_queue(page);
   remove_page_from_hash_queue(page);
   if (page_count(page))
 add_page_to_inode_unmapped(page);

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Russell King

Alexander Viro writes:
> Trond, I'm not asking about implementation - the question being what
> semantics do you want for nfs_zap_caches() wrt user-mapped pages.

Ok, looking through sendmail, and then db2, the situation is
created by the db2 library.  If the process does the following:

1. creates NFS file
2. sets file size by lseek + write
3. maps it
4. fcntl locks the file (writes above data back to file)
5. write to the mapping (makes pages dirty)
6. fcntl unlocks it (dirty page data is NOT written back)
7. closes it
8. unlinks it
9. some time later unmaps it(causes dirty data to be written back)

Note that (6) doesn't act as a barrier to synchronise writes in this case,
but it does for any normal write()s.  Surely NFS should cause any dirty
data associated with the file to be written back to the server no matter
what?

Although Linus' fix seems to prevent the problem, I get the feeling that it
is a sticky plaster over a much bigger problem.
   _
  |_| - ---+---+-
  |   | Russell King[EMAIL PROTECTED]  --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
  | +-+-+ --- -+-
  /   |   THE developer of ARM Linux  |+| /|\
 /  | | | ---  |
+-+-+ -  /\\\  |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Petr Vandrovec


On 19 Oct 00 at 16:32, Linus Torvalds wrote:

> How about adding a test in invalidate_inode_pages() like
> 
> /* We cannot invalidate a locked page */
> if (TryLockPage(page))
> continue;
> 
> +   /* We cannot invalidate a page that is in use */
> +   if (page_count(page) != 1) {
> +   UnlockPage(page);
> +   continue;
> +   }
> +
> __lru_cache_del(page);
> __remove_inode_page(page);
 
Hi Linus,
  this did not fix problem with my testcase - 
mmap(shared),fork,dirty_it,ftruncate,finish - does NOT go through
this code (invalidate_inode_pages) at all (should it?).

  I instrumented both swapout in page_io, and __remove_inode_page,
and pages are going through __remove_inode_page with strange use
count. I do not know whether it is good or no... During normal life
of system, pages with count==2 && index==0 are passed to __remove_inode_page.
As soon as I start my test, non-zero index pages with count > 2 are
passed here. Later this pages appear in filemap_sync...
Thanks,
Petr Vandrovec
[EMAIL PROTECTED]

vana:~# free
 total   used   free sharedbuffers cached
Mem:255768 203232  52536  0   4084  35496
-/+ buffers/cache: 163652  92116
Swap:   530136  0 530136
vana:~# free
 total   used   free sharedbuffers cached
Mem:255768 203376  52392  0   4084  35636
-/+ buffers/cache: 163656  92112
Swap:   530136  0 530136

vana:~# free
 total   used   free sharedbuffers cached
Mem:255768  72172 183596  0   4084  32888
-/+ buffers/cache:  35200 220568
Swap:   530136  0 530136
vana:~# free
 total   used   free sharedbuffers cached
Mem:255768  72180 183588  0   4084  32888
-/+ buffers/cache:  35208 220560
Swap:   530136  0 530136


11:57:44 Bad page count in __remove_inode_page!
11:57:44   page: c1349ef8
11:57:44 mapping:  cc65c6dc
11:57:44 index:0
11:57:44 nexthash: 
11:57:44 count:2
11:57:44 flags:0x0009
11:57:44 age:  0
11:57:44 pprevhash: 
11:57:44 buffers:  
11:57:44 virtual:  cc61a000
11:57:44 zone: c021f578
11:57:44 pagedump done
11:57:44 Bad page count in __remove_inode_page!
11:57:44   page: c1120394
11:57:44 mapping:  cc6b2edc
11:57:44 index:32767
11:57:44 nexthash: 
11:57:44 count:3
11:57:44 flags:0x0009
11:57:44 age:  5
11:57:44 pprevhash: 
11:57:44 buffers:  
11:57:44 virtual:  c43d1000
11:57:44 zone: c021f578
11:57:44 pagedump done

11:57:46 Bad page count in __remove_inode_page!
11:57:46   page: c1266298
11:57:46 mapping:  cc6b2edc
11:57:46 index:13635
11:57:46 nexthash: 
11:57:46 count:3
11:57:46 flags:0x0009
11:57:46 age:  5
11:57:46 pprevhash: 
11:57:46 buffers:  
11:57:46 virtual:  c9082000
11:57:46 zone: c021f578
11:57:46 pagedump done

11:57:46 page->mapping == NULL
11:57:46 error:  -22
11:57:46 ptep:   c92f6a2c
11:57:46 pteval: 09062027
11:57:46 vma:ce926420
11:57:46   vm_mm:ccd67d60
11:57:46   vm_start: 40128000
11:57:46   vm_end:   48128000
11:57:46   vm_next:  c1490b20
11:57:46   vm_avl_height:  1
11:57:46   vm_avl_left:
11:57:46   vm_avl_right:   
11:57:46   vm_next_share:  
11:57:46   vm_pprev_share: ccdee684
11:57:46   vm_operations_struct: c021f1a0
11:57:46   vm_pgoff:  
11:57:46   vm_file:   cd4270e0
11:57:46   vm_raend:  
11:57:46   vm_private_data:  
11:57:46 address:4368B000
11:57:46 flags:  0001
11:57:46   file: cd4270e0
11:57:46 dentry:   cc8ae440
11:57:46   inode: cc6b2e40
11:57:46   num: 848677
11:57:46   dev: 0x0302
11:57:46   path: /usr/src/tst/ram0 (deleted)
11:57:46 vfsmount: c14ca8c0
11:57:46 op:   c0221bc0
11:57:46 count:4
11:57:46 flags:0x0002
11:57:46 mode: 03
11:57:46 pos:  0
11:57:46 reada:0
11:57:46 ramax:0
11:57:46 raend:0
11:57:46 ralen:0
11:57:46 rawin:0
11:57:46 owner.pid: 0
11:57:46 owner.uid: 0
11:57:46 owner.euid: 0
11:57:46 owner.signum: 0
11:57:46 uid:  0
11:57:46 gid:  0
11:57:46 error:0
11:57:46 version:  11641
11:57:46 private_data: 
11:57:46   page: c1265a18
11:57:46 mapping:  
11:57:46 index:

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust

> " " == Alexander Viro <[EMAIL PROTECTED]> writes:

 > So what exactly do you want it to do when page is mapped by
 > user process?  Should it remain visible or not? What should
 > happen if process writes to that page?

 > Trond, I'm not asking about implementation - the question being
 > what semantics do you want for nfs_zap_caches() wrt user-mapped
 > pages.

For the general case of the page cache I think we can keep them quite
simple:

+ We do in any case want to drop all pages that are unreferenced. (The
reason for flushing may be that the file size has changed.)

+ For pages that are referenced (and unlocked) we would like to force
them to get read in anew ASAP. How this is done in practice is
irrelevant as far as NFS is concerned provided that we don't sleep on
any I/O while in nfs_zap_caches()/invalidate_inode_pages().

The lower level stuff can and will sort out the business of flushing
out pending writebacks that conflict with the read, so that isn't a
problem for the VFS/VM.

The problem lies with writes that haven't yet been msync()ed (and
hence do not have writebacks). For shared mappings, one should perhaps
schedule an automatic msync() of the dirty pages (???). For private
mappings, perhaps the best thing would be to defer the read?

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds

On Fri, 20 Oct 2000, Trond Myklebust wrote:
> 
> The problem lies with writes that haven't yet been msync()ed (and
> hence do not have writebacks). For shared mappings, one should perhaps
> schedule an automatic msync() of the dirty pages (???). For private
> mappings, perhaps the best thing would be to defer the read?

Note that NONE of this is going to happen for 2.4.x.

We've never _ever_ done this before, there's no point in even suggesting
that this is suddenly a "critical" bug. It's not.

I want to know what the suggestion for 2.4.x is. Right now that's the "if
the count is elevated, we don't invalidate". 

Quite frankly, I don't see any other option. Doing the !Uptodate version
will lose local data as it stands now - in fact right now you'd lose data
that way even if you are the only client accessing the file, which is
obviously complete crap and _completely_ unacceptable.

I'm open to suggestions, but I haven't heard anything realistic.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust

> " " == Linus Torvalds <[EMAIL PROTECTED]> writes:

 > How about adding a test in invalidate_inode_pages() like

 >  /* We cannot invalidate a locked page */ if
 >  (TryLockPage(page))
 >  continue;

 > + /* We cannot invalidate a page that is in use */
 > + if (page_count(page) != 1) {
 > + UnlockPage(page);
 > + continue;
 > + }
 > +
 >  __lru_cache_del(page); __remove_inode_page(page);

 > Because otherwise we might end up invalidating pages that might
 > have pending write-back etc (although I think the NFS logic
 > tries to avoid invalidating when there's pending activity).

You may remember that I've been fighting a rearguard action against
this for some time.

Under NFS the problem is that pages can (and *should*) be invalidated
despite there being pending write backs. The server can trigger the
need for a cache invalidation at any time.
The existence of file locks that aren't page aligned, as well as
partial page writeback ensure that we cannot make the equivalence
   page has pending write == page is correct.

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust

> " " == Russell King <[EMAIL PROTECTED]> writes:

 > invalidate_inode_pages nfs_zap_caches nfs_lock fcntl_setlk
 > do_fcntl sys_fcntl

 > So I guess that NFS locking is really bad if the region is
 > mmapped!

Yep, but that's a symptom, not a cause. We want to be able to run
invalidate_inode_pages() safely at any moment, since the need can be
triggered externally (because the server and client page caches
disagree).

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust


> " " == Russell King <[EMAIL PROTECTED]> writes:

 > Indeed.  page->mapping is set to NULL in two places, one in
 > __remove_inode_pages() and the other one in the swap code after
 > we've checked that it was NULL.  I hadn't found the particular
 > call trace that caused us to ended up in __remove_inode_page()
 > though with this page.

Have you tried tagging invalidate_inode_pages() with a debugging
statement?

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust

> " " == David S Miller <[EMAIL PROTECTED]> writes:

 > Actually, judging by the trace you provided Russell, I'd say
 > this is some peculiarity with NFS silly rename handling, and
 > it'd be best to look for the bug in that code (early inode
 > reference loss, for example?)

Russel's trace indicates that the unlink actually has completed and
has become a negative dentry since the file is labelled '(deleted)'.
That means that the dentry count must have been zero, so that
dentry_iput() was called.

I don't see how dentry_iput() can be called on an open file. In
principle the dentry count should always be >= 1, so unless there is
some place where we're calling d_delete() without get()ing the dentry
first, there should be no path for early inode loss.

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Russell King


Linus Torvalds writes:
> Can you get a full stack trace?

filemap_write_page
filemap_sync
filemap_unmap
do_munmap
sys_munmap

> How about adding a test in invalidate_inode_pages() like

(Added, along with a call to drop a stack trace out).

Yes, this does stop the problem in filemap_write_page.  The stack trace
here shows:

invalidate_inode_pages
nfs_zap_caches
nfs_lock
fcntl_setlk
do_fcntl
sys_fcntl

So I guess that NFS locking is really bad if the region is mmapped!
   _
  |_| - ---+---+-
  |   | Russell King[EMAIL PROTECTED]  --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
  | +-+-+ --- -+-
  /   |   THE developer of ARM Linux  |+| /|\
 /  | | | ---  |
+-+-+ -  /\\\  |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Linus Torvalds

In article <8snvuj$1l0$[EMAIL PROTECTED]>,
Linus Torvalds <[EMAIL PROTECTED]> wrote:
>
>Hmm.. Looks like page->mapping was cleared by truncate_inode_pages()
>when the inode was free'd, and there was still write-back activity on
>one of the pages in question. 

Looking some more, the fact that the oops happens in filemap_sync_pte()
definitely means that we haven't gotten around to truncating the sucker
yet. So invalidate_inode_pages() looks like the more likely cause,
especially as the page writeback correctly increases the file count for
NFS, so we should never be able to get into the horror-schenario of
having write-back after the pages have been truncated due to the inode
getting cleared.

Does the simple added test for page counts to invalidate_inode_pages()
fix the problem for you, Russell?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Trond Myklebust  <[EMAIL PROTECTED]> wrote:
>> " " == Petr Vandrovec <[EMAIL PROTECTED]> writes:
>
> > You do not have to use NFS - look for my postings with
> > 'page->mapping == NULL' in archive. Your code uses shared mmap,
> > 'page->isn't
> > it? Probably shared by couple of processes.
>
>It's probably particularly nasty under NFS because of
>invalidate_inode_pages().

Btw, that "invalidate_inode_pages()" thing is just wrong - we can't just
remove pages that are mapped etc, because that would result in no end of
fun aliasing problems etc.

I don't see why invalidate_inode_pages() would be called for this
sequence, though, which is why I'd be more likely to blame the truncate
case, but I think you're right about invalidate_inode_pages() being a
problem none-the-less.

How about adding a test in invalidate_inode_pages() like

/* We cannot invalidate a locked page */
if (TryLockPage(page))
continue;

+   /* We cannot invalidate a page that is in use */
+   if (page_count(page) != 1) {
+   UnlockPage(page);
+   continue;
+   }
+
__lru_cache_del(page);
__remove_inode_page(page);

Because otherwise we might end up invalidating pages that might have
pending write-back etc (although I think the NFS logic tries to avoid
invalidating when there's pending activity).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Russell King  <[EMAIL PROTECTED]> wrote:
>Petr Vandrovec writes:
>> ... or from sys_exit() if you forget to unmap. Or from anywhere if
>> swapping code decides to swap such page. I'm trying to hunt it down
>> for more than month, but I have no idea what's wrong. In my case
>> way to trigger bug is:
>
>I actually think its as simple as:
>
>1. shared map a file
>2. close file
>3. unlink file
>4. unmap shared mapping
>
>In my case, I was running newaliases, and there wasn't any other
>processes using that mapping.

Hmm.. Looks like page->mapping was cleared by truncate_inode_pages()
when the inode was free'd, and there was still write-back activity on
one of the pages in question. 

Looking at the problem, it appears that the unmap() will (correctly) do
a writepage() for each dirty page, and that will (again correctly) cause
NFS to do a delayed write-out with all the proper things (increment page
count because of the pending write etc). The inode hasn't been free'd
yet, because we're still using it.

But then when we _do_ release the inode, we end up doing
truncate_inode_pages() and through that we clear page->mapping, so the
pending write-out is SOL.  I have this feeling that we probably should
NOT remove the mapping until the page count has gone down to zero. 

The other alternative would be to flush the pending write-outs (or wait
for them synchronously - but that's just stupid and inefficient), which
would imply that we'd better add the "flush" operation to "struct
address_space_operations". 

Can you get a full stack trace?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread David S. Miller

   From: Russell King <[EMAIL PROTECTED]>
   Date:Fri, 20 Oct 2000 00:07:55 +0100 (BST)

   Trond Myklebust writes:
   > It's probably particularly nasty under NFS because of
   > invalidate_inode_pages(). The latter empties the page cache whenever
   > we can no longer trust it and calls remove_inode_page() on every
   > unlocked page. It won't care whether the page is mmapped or not.
   > 
   > My guess is therefore that the line setting 'page->mapping = NULL' in
   > __remove_inode_page() is a candidate for scrutiny...

   Indeed.  page->mapping is set to NULL in two places, one in
   __remove_inode_pages() and the other one in the swap code after we've
   checked that it was NULL.  I hadn't found the particular call trace
   that caused us to ended up in __remove_inode_page() though with this
   page.

Actually, judging by the trace you provided Russell, I'd say this is
some peculiarity with NFS silly rename handling, and it'd be best to
look for the bug in that code (early inode reference loss, for
example?)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Russell King


Trond Myklebust writes:
> It's probably particularly nasty under NFS because of
> invalidate_inode_pages(). The latter empties the page cache whenever
> we can no longer trust it and calls remove_inode_page() on every
> unlocked page. It won't care whether the page is mmapped or not.
> 
> My guess is therefore that the line setting 'page->mapping = NULL' in
> __remove_inode_page() is a candidate for scrutiny...

Indeed.  page->mapping is set to NULL in two places, one in
__remove_inode_pages() and the other one in the swap code after we've
checked that it was NULL.  I hadn't found the particular call trace
that caused us to ended up in __remove_inode_page() though with this
page.
   _
  |_| - ---+---+-
  |   | Russell King[EMAIL PROTECTED]  --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
  | +-+-+ --- -+-
  /   |   THE developer of ARM Linux  |+| /|\
 /  | | | ---  |
+-+-+ -  /\\\  |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust

> " " == Petr Vandrovec <[EMAIL PROTECTED]> writes:

 > You do not have to use NFS - look for my postings with
 > 'page->mapping == NULL' in archive. Your code uses shared mmap,
 > 'page->isn't
 > it? Probably shared by couple of processes.

It's probably particularly nasty under NFS because of
invalidate_inode_pages(). The latter empties the page cache whenever
we can no longer trust it and calls remove_inode_page() on every
unlocked page. It won't care whether the page is mmapped or not.

My guess is therefore that the line setting 'page->mapping = NULL' in
__remove_inode_page() is a candidate for scrutiny...

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Russell King


Roger Larsson writes:
> Will it work correctly if 4. is done before 3. (even before 2?)
> Is it legal/good practice to unmap the file after closing it?
> (Since the sharing needs the fd to mmap it)

Dunno, haven't tried that yet.  I'll have a go tomorrow, but I think
it'll work correctly.  I'll also dig about in the VFS / NFS code some
more to see if I can find anything obvious.

> I would have expected this order:
>   A. open file
>   B. shared map file
>   C. unmap shared mapping
>   D. close file
>   E. unlink file

BTW, the program which is giving grief is good ol' sendmail, v8.11.
It does the unmap after close all over the place.
   _
  |_| - ---+---+-
  |   | Russell King[EMAIL PROTECTED]  --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
  | +-+-+ --- -+-
  /   |   THE developer of ARM Linux  |+| /|\
 /  | | | ---  |
+-+-+ -  /\\\  |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Roger Larsson

Russell King wrote:
> 
> Petr Vandrovec writes:
> > ... or from sys_exit() if you forget to unmap. Or from anywhere if
> > swapping code decides to swap such page. I'm trying to hunt it down
> > for more than month, but I have no idea what's wrong. In my case
> > way to trigger bug is:
> 
> I actually think its as simple as:
> 
  0. open file
> 1. shared map file
> 2. close file
> 3. unlink file
> 4. unmap shared mapping

Will it work correctly if 4. is done before 3. (even before 2?)
Is it legal/good practice to unmap the file after closing it?
(Since the sharing needs the fd to mmap it)

I would have expected this order:
  A. open file
  B. shared map file
  C. unmap shared mapping
  D. close file
  E. unlink file

But in your case the unlink should be deferred to unmap time...
(if it is legal)

Successful unlinking a file should probably free pages directly to
free list - might be worth optimizing for.

/RogerL

Disclaimer:
 I do not really understand fs code...

> 
> In my case, I was running newaliases, and there wasn't any other
> processes using that mapping.
>_
>   |_| - ---+---+-
>   |   | Russell King[EMAIL PROTECTED]  --- ---
>   | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
>   | +-+-+ --- -+-
>   /   |   THE developer of ARM Linux  |+| /|\
>  /  | | | ---  |
> +-+-+ -  /\\\  |
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Russell King


Petr Vandrovec writes:
> ... or from sys_exit() if you forget to unmap. Or from anywhere if
> swapping code decides to swap such page. I'm trying to hunt it down
> for more than month, but I have no idea what's wrong. In my case
> way to trigger bug is:

I actually think its as simple as:

1. shared map a file
2. close file
3. unlink file
4. unmap shared mapping

In my case, I was running newaliases, and there wasn't any other
processes using that mapping.
   _
  |_| - ---+---+-
  |   | Russell King[EMAIL PROTECTED]  --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
  | +-+-+ --- -+-
  /   |   THE developer of ARM Linux  |+| /|\
 /  | | | ---  |
+-+-+ -  /\\\  |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Petr Vandrovec


On 19 Oct 00 at 22:16, Russell King wrote:

> I'm seeing an oops caused by a NULL pointer dereference in mm/filemap.c,
> filemap_write_page.  The NULL pointer in question is page->mapping.
> The box on which this is happening is using a root NFS filesystem (in
> fact all but one of its filesystems are NFS).

You do not have to use NFS - look for my postings with 
'page->mapping == NULL' in archive. Your code uses shared mmap, isn't
it? Probably shared by couple of processes.

> So the file that was mapped has been deleted...  Ok, 
> With the following in filemap_sync_pte:
> 
> this reveals:
> 
> --> page c017e118 mapping NULL - address 40264000
> --> vma->vm_file = c0c8b120
> --> /var/tmp/.nfs16130002 (deleted)
> 
> And the stack trace indicates that the kernel execution path came from
> sys_munmap.

... or from sys_exit() if you forget to unmap. Or from anywhere if
swapping code decides to swap such page. I'm trying to hunt it down
for more than month, but I have no idea what's wrong. In my case
way to trigger bug is:

(1) shared map some page
(2) unlink underlying file
(3) fork
(4)   from child dirty pages
(5)   from parent truncate file
(6) exit both tasks
(7) oops

Just in case you'll not find oops demo code in archive... Last tested 
under 2.4.0-test10-pre3. With -pre4 I cannot decide whether to comment 
out BUG() in vmscan, or not, so I did not test -pre4 yet... Maybe you
can simplify code even more. During tests (dual PIII/450, 256MB RAM)
system is completely dead for up to 40 secs, so do not panic. It is not
needed to hit reset button, just wait ;-) Any idea welcomed.
Best regards,
Petr Vandrovec
[EMAIL PROTECTED]

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

unsigned char zero;

#define MSIZE 0x0E00

void x4768(void) {
int fd;
pid_t pid;
int x4778_1[2];
int x4778_2[2];
int x4779_1[2];
int x4779_2[2];
int from4778;
int to4778;
int from4779;
int to4779;
char* mma[65536];
unsigned int mml[65536];
unsigned int mmi = 0;
unsigned int ln = 0;
unsigned int x;

#define MAL2(a,l) ftruncate(fd, ln+l); mml[mmi] = l; mma[mmi++] = mmap(a, l, 
PROT_READ|PROT_WRITE, MAP_SHARED, fd, ln); ln += l; mma[mmi-1][0] = 99;
#define MAL(l) MAL2(NULL,l)

fd = open("ram0", O_RDWR | O_CREAT, 0600);
unlink("ram0");
MAL(MSIZE);
pipe(x4778_1);
pipe(x4778_2);
pid = fork();
if (!pid) {
int from4768 = x4778_1[0];
int to4768 = x4778_2[1];
close(x4778_1[1]);
close(x4778_2[0]);
close(fd);
read(from4768, &zero, 1);
for (x = 0; x < mml[mmi - 1]; x += 4096)
mma[mmi-1][x] = 98;
write(to4768, &zero, 1);
read(from4768, &zero, 1);
exit(0);

} else if (pid < 0) {
perror("fork failed");
exit(222);
}
to4778 = x4778_1[1];
from4778 = x4778_2[0];
close(x4778_1[0]);
close(x4778_2[1]);

/* so that read/write to this fails... */
from4779 = -1;
to4779 = -1;
{
write(to4778, &zero, 1);
write(to4779, &zero, 1);
read(from4778, &zero, 1);
read(from4779, &zero, 1);

write(to4778, &zero, 1);
write(to4779, &zero, 1);
sleep(5);
ftruncate(fd, 0);
}
close(fd);
while (mmi--) {
munmap(mma[mmi], mml[mmi]);
}
exit(0);
}

int main(int argc, char* argv[]) {
printf("Go\n");
mlockall(MCL_CURRENT);

x4768();
return 0;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

61 matches

Mail list logo