Re: PC speaker driver patch for 2.4.0-test10-pre3

2000-10-24 Thread Oliver Xymoron

On Tue, 24 Oct 2000, Pavel Machek wrote:

> Hi!
> 
> > > [EMAIL PROTECTED] said:
> > > >  You can also pretty trivially keep track of an error term so that the
> > > > clock is right on average: 
> > > 
> > > True, but I don't want 'right on average'. I want 'not screwed with at all'.
> > > Shifting the timer tick onto the RTC will give me that. 
> > > 
> > > Even if we _do_ get the maths right, fudging with the frequency of the 
> > > system timer is still an ugly hack.
> > 
> > I personally think the system timer is already an ugly hack. HZ is
> > arbitrary, slow by modern standards, and introduces latency.
> > As the comment you quoted points out, it's also not very accurate.  
> > Much better would be an agenda structure with one shot timers between
> > events and jiffies based on cycle counters. This works on modern hardware
> > and scales well for higher processor speeds. 
> 
> ...and breaks horribly when your CPU frequency changes... like on most
> current notebooks.

The unit of time is arbitrary. Cycle counts just happens to be convenient
on most machines and very high resolution. The important point is that
hanging tasks off a periodic low-resolution timer is not scaling with CPU
clock speeds at all and there are ways to throw it out.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PC speaker driver patch for 2.4.0-test10-pre3

2000-10-24 Thread Pavel Machek

Hi!

> > [EMAIL PROTECTED] said:
> > >  You can also pretty trivially keep track of an error term so that the
> > > clock is right on average: 
> > 
> > True, but I don't want 'right on average'. I want 'not screwed with at all'.
> > Shifting the timer tick onto the RTC will give me that. 
> > 
> > Even if we _do_ get the maths right, fudging with the frequency of the 
> > system timer is still an ugly hack.
> 
> I personally think the system timer is already an ugly hack. HZ is
> arbitrary, slow by modern standards, and introduces latency.
> As the comment you quoted points out, it's also not very accurate.  
> Much better would be an agenda structure with one shot timers between
> events and jiffies based on cycle counters. This works on modern hardware
> and scales well for higher processor speeds. 

...and breaks horribly when your CPU frequency changes... like on most
current notebooks.
Pavel
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-24 Thread Alexander Viro



On Tue, 24 Oct 2000, Petr Vandrovec wrote:

> On 23 Oct 00 at 23:05, Alexander Viro wrote:
> 
> > Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what
> > analysis had been done? Petr, can you reproduce the problem on -test7?
> > Unfortunately, clean test would take the backport of ext2 changes
> > (truncate-related, happened around the same time), but IIRC -test7 was
> > relatively stable...
> 
> Just FYI - I was able to reproduce it with 2.4.0-test5 and all laters
> (different pre-s of test6/7/8/9/10) I have. I was not able to reproduce 
> it with 2.2.17-pre13 (although this kernel had really bad behavior when
> I run out of space with backing file - sendmail suspended itself due
> to load of 37... on machine which has usually load 0.01). 
> 
> I do not have kernels between 2.2.17 and 2.4.0-test5 installed.
> 
> If you have anything I should check...

I see the race that can lead to screwups you've seen, but I don't see a
really clear way to close it.

Race in question:
truncate_inode_pages() removes the page from pagecache.
sync_pte is called before we get to killing the pagetable
references.

Potentially, the following might be more or less OK, but I'm not too
happy with it:
* All ->nopage() methods mark page uptodate (upon success, that
is).
* do_no_page() fails if page is not uptodate.
* vmtruncate starts with setting the size and going through the
pagecache, removing all fully truncated pages from the hash chains and
gathering them in the list and removing uptodate flag.
* then it does vmtruncate_list() calls, zapping the pagetable
references to the pages in question.
* *then* it goes through the list and does the rest of
truncate_inode_pages() work.

The theory being:
once the i_size is set, no new pages will be bound to affected
part of file.
when we unhash all affected pages and mark them non-uptodate there
will be no new pagetable references to them (they will not be found, and
any pagefault in progress will fail since we mark them non-uptodate).
any possible IO-in-progress on these pages will not be disturbed -
it doesn't depend on the page being hashed. Aliases do not matter, since
they won't be bound to the relevant part of file anyway.
once we kill all pagetable references (and no new references are
going to appear at that point) we can be sure that no process will
initiate _any_ operations on these pages. Notice that potential dirty data
on them doesn't matter - it's going to die anyway (file is being
truncated).
*now* we grab the page lock on each of these pages, thus letting
any method-in-progress to be completed and that point we can safely kill
->mapping.

Linus, could you comment on that variant?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-24 Thread Petr Vandrovec

On 23 Oct 00 at 23:05, Alexander Viro wrote:

> Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what
> analysis had been done? Petr, can you reproduce the problem on -test7?
> Unfortunately, clean test would take the backport of ext2 changes
> (truncate-related, happened around the same time), but IIRC -test7 was
> relatively stable...

Just FYI - I was able to reproduce it with 2.4.0-test5 and all laters
(different pre-s of test6/7/8/9/10) I have. I was not able to reproduce 
it with 2.2.17-pre13 (although this kernel had really bad behavior when
I run out of space with backing file - sendmail suspended itself due
to load of 37... on machine which has usually load 0.01). 

I do not have kernels between 2.2.17 and 2.4.0-test5 installed.

If you have anything I should check...
Thanks,
Petr Vandrovec
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro



On Mon, 23 Oct 2000, Linus Torvalds wrote:

> Note that if there really are only 9 "nopage" routines, then it is a lot
> easier to just add the single "SetPageUptodate(page)" into those 9
> routines, and thus let the VM know of the race.

Works for me. And yes, the list of ->nopage instances that can return
success is that short:
drm_vm_shm_nopage
drm_vm_shm_nopage_lock
drm_vm_dma_nopage
sgi_graphics_nopage
via_mm_nopage
ncp_file_mmap_nopage
shm_nopage
shmzero_nopage
filemap_nopage
- sorry, it's not 9+filemap_nopage, it's 8+filemap_nopage.

However, it still leaves a window for the race: we invalidate first and
remove from pagetables later. And ClearPageUptodate() is obviously
truncate_inode_pages() work. Grr...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds



On Tue, 24 Oct 2000, Alexander Viro wrote:
> 
> It's not the only problem, but I would feel _much_ safer if pagefault
> wouldn't rely on pagecache miss. Actually... Hey. Why don't we do the
> insertion into page tables _within_ ->nopage()?

NO!

We used to do this a LOONG time ago.

Distributing the VM information on how to properly insert a pte into the
page table into drivers is _NOT_ something I want to do again. It was a
pain to get it properly modularized, we're not going back to the horror.

Note that if there really are only 9 "nopage" routines, then it is a lot
easier to just add the single "SetPageUptodate(page)" into those 9
routines, and thus let the VM know of the race.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro



On Mon, 23 Oct 2000, Linus Torvalds wrote:

> 
> 
> On Mon, 23 Oct 2000, Alexander Viro wrote:
> > 
> > Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what
> > analysis had been done? Petr, can you reproduce the problem on -test7?
> 
> I don't think that is it - that code looks very straightforward (and is
> needed on some silly architectures that cannot easily otherwise see if
> they need to be coherent wrt user space - mainly sparc and virtual
> caches).

OK, I see where the race can happen. Yes, vmtruncate() tries to kill the
mappings. Right. However, it does that _after_ truncate_inode_pages().
And there is a window when the only lock we are holding is ->i_sem.
sync_pte in that window ==> we are fucked.

So the question being: WTF do we postpone zapping the page tables until
after the truncate_inode_pages()? The following rules might make life
simpler, AFAICS:
* as soon as ->i_size is set, no new pagetable references to
off-limits pages can appear.
* as soon as we are don with vmtruncate_list() there is no
pagetable references.
* truncate_inode_pages() never has to deal with pages refered from
pagetables.
* ->i_size can't increase until we return from vmtruncate().

It's not the only problem, but I would feel _much_ safer if pagefault
wouldn't rely on pagecache miss. Actually... Hey. Why don't we do the
insertion into page tables _within_ ->nopage()? Look: let's take the tail
of do_no_page() into helper function and just call it from the end of
every bloody ->nopage() out there. It _is_ easy: we have only 9 instances
in the tree not counting filemap_nopage(). Moreover,
do_anonymous_page() will become symmetrical to the rest of the crowd.
Comments?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds



On Mon, 23 Oct 2000, Alexander Viro wrote:
> 
> Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what
> analysis had been done? Petr, can you reproduce the problem on -test7?

I don't think that is it - that code looks very straightforward (and is
needed on some silly architectures that cannot easily otherwise see if
they need to be coherent wrt user space - mainly sparc and virtual
caches).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro



Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what
analysis had been done? Petr, can you reproduce the problem on -test7?
Unfortunately, clean test would take the backport of ext2 changes
(truncate-related, happened around the same time), but IIRC -test7 was
relatively stable...


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro



On Mon, 23 Oct 2000, Linus Torvalds wrote:

> Also, the fact that Petr didn't see anything trigger in nopage() makes me
> nervous again. Even if the problem happened during read-ahead, it should
> have gotten into the address space only through nopage. Maybe there is
> some vma that isn't added to the right inode VM list - so that we end up
> missing part of the vmtruncate() stuff?
> 
> Al, any ideas?

Just one: let's slow down a bit and try to write down the rules. Close to
release or not, let's understand WTF happens in that part of VM before
deciding on the choice of band-aids/fixes. Frankly, right now I don't feel
that area - too many changes during the last month. So I'm going to sit
down and read it through. IMO it's worth the delay - I have a gut feeling
that band-aids are going to cost more.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds



On Mon, 23 Oct 2000, Linus Torvalds wrote:
> 
> I'm starting to suspect that we leave this path as-is, and just fix the
> mapping case (and PageUptodate() can work there). That should also avoid
> the nasties.

..and even that looks like I'd have to do the quick-and-dirty case with
the race still there on SMP. Adding the page Uptodate logic to the VM
layer proper is too painful at this point - every single nopage function
would have to be updated to mark its page up-to-date, as they don't
generally do that currently.

Also, the fact that Petr didn't see anything trigger in nopage() makes me
nervous again. Even if the problem happened during read-ahead, it should
have gotten into the address space only through nopage. Maybe there is
some vma that isn't added to the right inode VM list - so that we end up
missing part of the vmtruncate() stuff?

Al, any ideas?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds



On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> 
> With ClearPageDirty() kernel locked up (but no watchdog, so probably
> some livelock) during bootup after fsck /

Actually, it turns out that even with this issue fixed, there's the more
serious issue that the page _has_ to be removed from the page cache once
we get to the point that we're freeing the whole inode (which also calls
truncate_inode_pages).

Which means that we cannot take the easy way out and say "let's delay the
freeing until everything is ok" - even if a buffer is busy due to some
unfortunate timing (so that we turn the page into anonymous buffers), we'd
better get rid of the page from the page cache.

I'm starting to suspect that we leave this path as-is, and just fix the
mapping case (and PageUptodate() can work there). That should also avoid
the nasties.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds



On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> 
> With ClearPageDirty() kernel locked up (but no watchdog, so probably
> some livelock) during bootup after fsck /.

Yeah, the way the truncate logic works right now truncate_whole_page() has
to remove the page from the inode list - otherwise truncate ends up
looping forever trying to truncate that page ;).

And there was a off-by-one error in my first untested version anyway: the
page_count() should be tested against "2", not "1", as the truncate logic
has elevated the count anyway (probably unnecessarily: once we get the
page lock nobody else can race to remove it from the page cache anyway, so
it's not as if it could go away).

> Should I try ClearPageUptodate() instead?

No, I'll have to fix the truncate logic to allow for this all.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec

On 23 Oct 00 at 14:34, Linus Torvalds wrote:
> On Mon, 23 Oct 2000, Alexander Viro wrote:
> > On Mon, 23 Oct 2000, Linus Torvalds wrote:
> > > 
> > > Nope, that just makes the race window smaller. We should check for i_size
> > > after we've gotten the page table lock and just before actually entering
> > > the page into the page tables. Otherwise we'll still race on SMP (a _very_
> > > hard window to get into, admittedly).
> > 
> > Umm... I would probably remove Uptodate upon truncate() and check _that_
> > in the place you've mentioned.
> 
> Works for me..
> 
> > > ClearPageDirty(page);
> >   ClearPageUptodate(page);
> > 
> > How about that?
> 
> Makes sense. 

With ClearPageDirty() kernel locked up (but no watchdog, so probably
some livelock) during bootup after fsck /. Unfortunately, 'PC' column
shows complete garbage - init,kswapd,kupdate and rcS in 'R', with
'PC'=current for rcS, 0xc1459f28 for init, 0xc146ffa8 for kswapd and
0xc1469fc8 for kupdate (needless to say that my kernel does not have
20MB... whee what's with PC? it now prints esp instead of eip?). Should 
I try ClearPageUptodate() instead?

Petr Vandrovec
[EMAIL PROTECTED]

P.S.: I have to go home. Continuing after 10 hours.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds



On Mon, 23 Oct 2000, Alexander Viro wrote:
> 
> On Mon, 23 Oct 2000, Linus Torvalds wrote:
> > 
> > Nope, that just makes the race window smaller. We should check for i_size
> > after we've gotten the page table lock and just before actually entering
> > the page into the page tables. Otherwise we'll still race on SMP (a _very_
> > hard window to get into, admittedly).
> 
> Umm... I would probably remove Uptodate upon truncate() and check _that_
> in the place you've mentioned.

Works for me..

> > ClearPageDirty(page);
>   ClearPageUptodate(page);
> 
> How about that?

Makes sense. 

Note that I'd actually like to hear from Petr first _without_ any added
code in the nopage() handler - the issue of having a page mapped after
nopage() that we shouldn't have mapped is a separate one, and I'd first
like to hear if the problem really goes away even if the mapping bug is
still there..

And _then_ we fix the fact that we should not allow anybody to have a page
mapped past i_size.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds



On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> 
> Yes. With sleep(60) no oops occur (it takes ~45 secs to exit child). 
> This signals to me: should not vmtruncate_list acquire mm->mmap_sem, 
> if it modifies page tables?

No. 

It should get the page_table lock, but that is sufficient for anybody who
_clears_ page tables (and is pretty much the same case as paging something
out of somebody elses page tables).

>I cannot find anything what prevents doing 
> vmtruncate in one task and filemap_sync in another - neither
> page_table_lock spinlock, nor mmap_sem semaphore...

filemap_sync() does hold the page table lock. 

(Which is certainly not to say that it's necessarily bug-free, but I don't
see any obvious problems off-hand).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro



On Mon, 23 Oct 2000, Linus Torvalds wrote:

> On Mon, 23 Oct 2000, Alexander Viro wrote:
> > 
> > That's fine, but I'm afraid that we'll need a bit more than that. A couple of
> > obvious ones:
> > * filemap_nopage() needs the second check for ->i_size. Upon exit.
> 
> Nope, that just makes the race window smaller. We should check for i_size
> after we've gotten the page table lock and just before actually entering
> the page into the page tables. Otherwise we'll still race on SMP (a _very_
> hard window to get into, admittedly).

Umm... I would probably remove Uptodate upon truncate() and check _that_
in the place you've mentioned.

> So how about this truncate_complete_page() implementation:
> 
>   /*
>* Try to get rid of a page.. Clear it if it fails
>* for some reason. The page must be locked upon calling
>* this function.
>*
>* We remove the page from the page cache _after_ we have
>* destroyed all buffer-cache references to it. Otherwise some
>* other process might think this inode page is not in the
>* page cache and creates a buffer-cache alias to it causing
>* all sorts of fun problems ...
>*/
>   static inline void truncate_complete_page(struct page *page)
>   {
>   /* Try to get rid of buffers */
>   if (page->buffers)
>   block_flushpage(page, 0);
>   
>   spin_lock(&pagecache_lock);
>   spin_lock(&pagemap_lru_lock);
>   
>   if (page_count(page) != 1) {
>   memclear_highpage_flush(page, 0, PAGE_CACHE_SIZE);
>   } else {
>   ClearPageDirty(page);
ClearPageUptodate(page);

How about that?

>   __lru_cache_del(page);
>   __remove_inode_page(page);
>   page_cache_release(page);
>   }
>   spin_unlock(&pagemap_lru_lock);
>   spin_unlock(&pagecache_lock);
>   }
> 
> we should probably special-case the "block_flushpage()" failed case, but
> the above should do reasonable things with it (because page_count() will
> be > 1 due to buffers).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds



On Mon, 23 Oct 2000, Alexander Viro wrote:
> 
> That's fine, but I'm afraid that we'll need a bit more than that. A couple of
> obvious ones:
>   * filemap_nopage() needs the second check for ->i_size. Upon exit.

Nope, that just makes the race window smaller. We should check for i_size
after we've gotten the page table lock and just before actually entering
the page into the page tables. Otherwise we'll still race on SMP (a _very_
hard window to get into, admittedly).

> Moreover, what is (area->vm_mm == current->mm) doing in the existing check?

It's for ptrace. You can do ugly things with ptrace that aren't possible
for the process itself.

>   * truncate() should zero the page out if it doesn't remove it from
> cache.

So how about this truncate_complete_page() implementation:

/*
 * Try to get rid of a page.. Clear it if it fails
 * for some reason. The page must be locked upon calling
 * this function.
 *
 * We remove the page from the page cache _after_ we have
 * destroyed all buffer-cache references to it. Otherwise some
 * other process might think this inode page is not in the
 * page cache and creates a buffer-cache alias to it causing
 * all sorts of fun problems ...
 */
static inline void truncate_complete_page(struct page *page)
{
/* Try to get rid of buffers */
if (page->buffers)
block_flushpage(page, 0);

spin_lock(&pagecache_lock);
spin_lock(&pagemap_lru_lock);

if (page_count(page) != 1) {
memclear_highpage_flush(page, 0, PAGE_CACHE_SIZE);
} else {
ClearPageDirty(page);
__lru_cache_del(page);
__remove_inode_page(page);
page_cache_release(page);
}
spin_unlock(&pagemap_lru_lock);
spin_unlock(&pagecache_lock);
}

we should probably special-case the "block_flushpage()" failed case, but
the above should do reasonable things with it (because page_count() will
be > 1 due to buffers).

The above is obviously completely and utterly untested. Petr? Willing to
give this a go?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec

On 23 Oct 00 at 13:57, Linus Torvalds wrote:
> On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> 
> > First page->mapping == NULL entry in syslog is dated 22:23:58, but
> > couple of entries was lost before (probably I should print only '.' for
> > each such page; this run there was more than 100 such pages)
> > Another question is why SIGCHLD was delivered to parent AFTER ftruncate,
> > but exit was invoked couple of seconds before - maybe it syncs
> > child address space to disk?
> 
> exit() basically does do a msync(MSASYNC), so that could be it. 

Yes. With sleep(60) no oops occur (it takes ~45 secs to exit child). 
This signals to me: should not vmtruncate_list acquire mm->mmap_sem, 
if it modifies page tables? I cannot find anything what prevents doing 
vmtruncate in one task and filemap_sync in another - neither
page_table_lock spinlock, nor mmap_sem semaphore...
Thanks,
Petr Vandrovec
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro

On Mon, 23 Oct 2000, Linus Torvalds wrote:

> On Mon, 23 Oct 2000, Alexander Viro wrote:
> 
> > On Mon, 23 Oct 2000, Linus Torvalds wrote:
> > 
> > > Al, any ideas? I have this feeling that the simplest fix is just to leave
> > > the race open, and make truncate_complete_page() just leave such a "racy"
> > > page in the page cache. It will still race, and the invalid page will
> > > still exist, but the end result should be harmless.
> > 
> > Provided that we clean it - why the hell do we want to take it out of
> > the pagecache? I don't see any fundamental reasons to prohibit pages
> > past the ->i_size being hashed.
> 
> In fact, we used to allow them.
> 
> We do want to remove it from the page cache under normal circumstances: it
> makes for much better MM behaviour (ie free pages that are truly useless).
> But I suspect that just adding a test for "page_count(page) == 1" (ie same
> as for the page cache invalidation) before freeing it should give that
> advantage, along with avoiding the problematic unlikely case...

That's fine, but I'm afraid that we'll need a bit more than that. A couple of
obvious ones:
* filemap_nopage() needs the second check for ->i_size. Upon exit.
Moreover, what is (area->vm_mm == current->mm) doing in the existing check?
* truncate() should zero the page out if it doesn't remove it from
cache.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds



On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> 
> Yes. Bad news. No problem was catched in filemap_nopage, but one
> (of 57000) pages was dirty and had page->mapping == NULL... (maybe
> only one was caused that this was just after bootup, with plenty of memory)
> Maybe I should look at readahead code?

Yes, you're right, read-ahead is in fact even more likely to catch the
race.

> In case of truncate it is going irrevocable away. Accesses after truncate
> should (and sometime give you) SIGBUS...

Yes. But I'd rather have a small race where we under certain strange
circumstances forget to raise a SIGBUS than have a kernel bug that causes
oops and possible filesystem corruption. 

So I'm not just looking for a fix, I'm also looking for a SIMPLE fix, with
perhaps some major surgery later (things like making the inode semaphore a
read-write semaphore and just always synchronizing 100% on i_size - which
would fix the problem, but is not a 2.4.x thing, no way Jose).

> First page->mapping == NULL entry in syslog is dated 22:23:58, but
> couple of entries was lost before (probably I should print only '.' for
> each such page; this run there was more than 100 such pages)
> Another question is why SIGCHLD was delivered to parent AFTER ftruncate,
> but exit was invoked couple of seconds before - maybe it syncs
> child address space to disk?

exit() basically does do a msync(MSASYNC), so that could be it. 

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds



On Mon, 23 Oct 2000, Alexander Viro wrote:

> On Mon, 23 Oct 2000, Linus Torvalds wrote:
> 
> > Al, any ideas? I have this feeling that the simplest fix is just to leave
> > the race open, and make truncate_complete_page() just leave such a "racy"
> > page in the page cache. It will still race, and the invalid page will
> > still exist, but the end result should be harmless.
> 
> Provided that we clean it - why the hell do we want to take it out of
> the pagecache? I don't see any fundamental reasons to prohibit pages
> past the ->i_size being hashed.

In fact, we used to allow them.

We do want to remove it from the page cache under normal circumstances: it
makes for much better MM behaviour (ie free pages that are truly useless).
But I suspect that just adding a test for "page_count(page) == 1" (ie same
as for the page cache invalidation) before freeing it should give that
advantage, along with avoiding the problematic unlikely case...

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec

On 23 Oct 00 at 16:13, Alexander Viro wrote:
> On Mon, 23 Oct 2000, Linus Torvalds wrote:
> 
> > Al, any ideas? I have this feeling that the simplest fix is just to leave
> > the race open, and make truncate_complete_page() just leave such a "racy"
> > page in the page cache. It will still race, and the invalid page will
> > still exist, but the end result should be harmless.
> 
> Provided that we clean it - why the hell do we want to take it out of
> the pagecache? I don't see any fundamental reasons to prohibit pages
> past the ->i_size being hashed. Methods _must_ check for ->i_size,
> but they do it anyway. All race-prevention is based on page locks and
> ->i_sem.
> 
> Yes, filemap_nopage() should check for i_size at the very end and fail if
> the page became off-limits. But that's completely unrelated issue - it's
> mmap semantics, not pagecache one.

Yes. Bad news. No problem was catched in filemap_nopage, but one
(of 57000) pages was dirty and had page->mapping == NULL... (maybe
only one was caused that this was just after bootup, with plenty of memory)
Maybe I should look at readahead code? Although to be clear I do not 
know why. Unless there is bug in logic in test program, it should 
first dirty pages, and AFTER that it should truncate - and unmap and 
exit, without ever touching pages of mapping... My first testcases were 
with this race (and with raw devices), but then I found (by removing 
more and more code) that no race (and no raw devices) are required...
 
> The point being: we should _never_ drop ->mapping unless the page is
> irrevocably going away. We can (and probably should) drop the off-limits
> page as soon as ->count hits zero, but we should not do it before that.

In case of truncate it is going irrevocable away. Accesses after truncate
should (and sometime give you) SIGBUS...

 total   used   free sharedbuffers cached
Mem:255768  42208 213560  0496  18420
-/+ buffers/cache:  23292 232476
Swap:   530136  13200 516936

Strace of another run:

1688  22:23:41.748438 execve("./oopsdemo", ["./oopsdemo"], [/* 18 vars */]) = 0
1688  22:23:41.749058 brk(0)= 0x8049ae8
1688  22:23:41.749399 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40016000
1688  22:23:41.749641 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file 
or directory)
1688  22:23:41.749861 open("/etc/ld.so.cache", O_RDONLY) = 4
1688  22:23:41.750011 fstat(4, {st_mode=S_IFREG|0644, st_size=43818, ...}) = 0
1688  22:23:41.750253 old_mmap(NULL, 43818, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40017000
1688  22:23:41.750408 close(4)  = 0
1688  22:23:41.750538 open("/lib/libc.so.6", O_RDONLY) = 4
1688  22:23:41.750676 fstat(4, {st_mode=S_IFREG|0755, st_size=1057576, ...}) = 0
1688  22:23:41.750878 read(4, 
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\224\314"..., 4096) = 4096
1688  22:23:41.751173 old_mmap(NULL, 1072484, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) 
= 0x40022000
1688  22:23:41.751327 mprotect(0x4011e000, 40292, PROT_NONE) = 0
1688  22:23:41.751441 old_mmap(0x4011e000, 24576, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED, 4, 0xfb000) = 0x4011e000
1688  22:23:41.751633 old_mmap(0x40124000, 15716, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40124000
1688  22:23:41.751809 close(4)  = 0
1688  22:23:41.753291 munmap(0x40017000, 43818) = 0
1688  22:23:41.753554 getpid()  = 1688
1688  22:23:41.753785 fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(4, 1), ...}) = 0
1688  22:23:41.753993 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000
1688  22:23:41.754162 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
1688  22:23:41.754430 write(1, "Go\n", 3) = 3
1688  22:23:41.754727 open("ram0", O_RDWR|O_CREAT, 0600) = 4
1688  22:23:41.754949 unlink("ram0")= 0
1688  22:23:41.755103 ftruncate(4, 234881024) = 0
1688  22:23:41.755228 old_mmap(NULL, 234881024, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 
0) = 0x40128000
1688  22:23:41.755700 pipe([5, 6])  = 0
1688  22:23:41.755847 pipe([7, 8])  = 0
1688  22:23:41.756024 fork()= 1689
1689  22:23:41.756735 close(6 
1688  22:23:41.756798 close(5 
1689  22:23:41.756844 <... close resumed> ) = 0
1688  22:23:41.756894 <... close resumed> ) = 0
1689  22:23:41.756949 close(7 
1688  22:23:41.756997 close(8 
1689  22:23:41.757041 <... close resumed> ) = 0
1688  22:23:41.757089 <... close resumed> ) = 0
1689  22:23:41.757139 close(4 
1688  22:23:41.757188 write(6, "\0", 1 
1689  22:23:41.757250 <... close resumed> ) = 0
1688  22:23:41.757301 <... write resumed> ) = 1
1689  22:23:41.757355 read(5,  
1688  22:23:41.757405 read(7,  
1689  22:23:41.757450 <... read resumed> "\0", 1) = 1
1689  22:23:49.260756 write(8, "\0", 1) = 1
1689  22:23:49.436204 read(5,  
1688  22:23:49.442969 <... read resumed> "\

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro

On Mon, 23 Oct 2000, Linus Torvalds wrote:

> Al, any ideas? I have this feeling that the simplest fix is just to leave
> the race open, and make truncate_complete_page() just leave such a "racy"
> page in the page cache. It will still race, and the invalid page will
> still exist, but the end result should be harmless.

Provided that we clean it - why the hell do we want to take it out of
the pagecache? I don't see any fundamental reasons to prohibit pages
past the ->i_size being hashed. Methods _must_ check for ->i_size,
but they do it anyway. All race-prevention is based on page locks and
->i_sem.

Yes, filemap_nopage() should check for i_size at the very end and fail if
the page became off-limits. But that's completely unrelated issue - it's
mmap semantics, not pagecache one.

The point being: we should _never_ drop ->mapping unless the page is
irrevocably going away. We can (and probably should) drop the off-limits
page as soon as ->count hits zero, but we should not do it before that.

Comments?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds



On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> > I'll take a better look at the truncate case (I consider the invalidate
> > case closed). Do you have a simple test-program around?
> 
> Well, I cannot say simple. As I was not able to reproduce it with only
> one task, code below:

Ok, without running this I can already guess at what's up.

One process does the "truncate()", and races with another process that
does a page-in.

The truncate code does

notify_change(ATTR_SIZE);

which will basically cause a "vmtruncate(inode, newsize)".

Just before this happened, the other process gets a page fault, and does a
page_cache_read(). Now, because we are low on memory (this is why it only
shows up when you're swapping), that read will take a while. During that
time, the vmtruncate() starts to execute, and sets inode->i_size to the
new value (that would cause us to no longer accept a page fault - but we
already got past that check in the faulting process). It will then
invalidate all the inode pages > i_size.

Now, the page faulting process comes back, and puts the page into the VM
space. Never mind that it has in the meantime gotten i_mapping = NULL due
to the other process doing the truncate.

Now, if I'm right, you should be able to add something like

if (!old_page->mapping)
printk("mapping went away from under us\n");

to just before the "return old_page()" case in the success path of
filemap_nopage() (mm/filemap.c), and you should see that printk() trigger
when the bug happens.

(There are other users too that are not synchronized wrt inode size
changes and could get an access to a page past the end of the file this
way - I just think that filemap_nopage is probably the one where this is
most easily seen).

The above is obviously not a bug-fix, it's just a validation of the
theory.

Al, any ideas? I have this feeling that the simplest fix is just to leave
the race open, and make truncate_complete_page() just leave such a "racy"
page in the page cache. It will still race, and the invalid page will
still exist, but the end result should be harmless.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec

On 23 Oct 00 at 12:09, Linus Torvalds wrote:
> On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> > 
> > Hi Linus,
> >   unfortunately, this does not explain problem I reported. In my case,
> > underlying fs is ext2, and there is no locking around - only one task
> > does ftruncate() on (big) shareable mapped file (original code does it to
> > prevent dirty pages to be written to disk after their contents is no
> > longer relevant).
> 
> Yes. Yours is a separate problem, and is not due to invalidate_pages() at
> all, but apparently due to truncate_complete_page().
> 
> Basically, all the same old arguments do apply - we cannot just remove the
> page from the page cache if it is mapped.
> 
> Now, the truncate() case is different, because the code _tries_ to also
> zap the actual mappings. The fact that it fails to do so is a bit
> unnerving, actually. 
> 
> I'll take a better look at the truncate case (I consider the invalidate
> case closed). Do you have a simple test-program around?

Well, I cannot say simple. As I was not able to reproduce it with only
one task, code below:
(1) mlocks itself in memory (you can remove it)
(2) creates 224MB file and maps it shareable into memory (224 is
for 256MB machine; I was not able to reproduce it with less than 100MB
file on 256MB machine) (and you must have 224MB of free disk space
in current directory, otherwise...) and unlinks it
(3) fork
(4) child process dirties memory
(5) parent signals child to exit and wait 5 secs
(6) child exits (hopefully...)
(7) parent unmaps data
(8) parent exits
You'll always get page->mapping == NULL dereference... And btw, if
I start code below on my machine (dual PIII/450), after 2 secs of run
whole machine stops - only ping and console switching works - and stays
that way for more than 20 secs. After that program finishes (killing
kernel if no antioops in filemap_write_page present...) and everything
returns to normal (if "-/+ buffers/cache: 28848  226920" is normal ;-) ).

  You can look at "http://platan.vc.cvut.cz/listit" for dump of "struct page"
at time when page->mapping == NULL was catched in filemap_write_page,
or through linux-kernel@ archive searching for 'page->mapping == NULL'
during last month, either from me or from Quintella.
Thanks,
Petr Vandrovec
[EMAIL PROTECTED]


#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

unsigned char zero;

#define MSIZE 0x0E00

void x4768(void) {
int fd;
pid_t pid;
int x4778_1[2];
int x4778_2[2];
int from4778;
int to4778;
char* mma[65536];
unsigned int mml[65536];
unsigned int mmi = 0;
unsigned int ln = 0;
unsigned int x;

#define MAL2(a,l) ftruncate(fd, ln+l); mml[mmi] = l; mma[mmi++] = mmap(a, l, 
PROT_READ|PROT_WRITE, MAP_SHARED, fd, ln); ln += l; mma[mmi-1][0] = 99;
#define MAL(l) MAL2(NULL,l)

fd = open("ram0", O_RDWR | O_CREAT, 0600);
unlink("ram0");
MAL(MSIZE);
pipe(x4778_1);
pipe(x4778_2);
pid = fork();
if (!pid) {
int from4768 = x4778_1[0];
int to4768 = x4778_2[1];
close(x4778_1[1]);
close(x4778_2[0]);
close(fd);
read(from4768, &zero, 1);
for (x = 0; x < mml[mmi - 1]; x += 4096)
mma[mmi-1][x] = 98;
write(to4768, &zero, 1);
read(from4768, &zero, 1);
exit(0);

} else if (pid < 0) {
perror("fork failed");
exit(222);
}
to4778 = x4778_1[1];
from4778 = x4778_2[0];
close(x4778_1[0]);
close(x4778_2[1]);

{
write(to4778, &zero, 1);
read(from4778, &zero, 1);

write(to4778, &zero, 1);
sleep(5);
ftruncate(fd, 0);
}
while (mmi--) {
munmap(mma[mmi], mml[mmi]);
}
exit(0);
}

int main(int argc, char* argv[]) {
printf("Go\n");
mlockall(MCL_CURRENT);

x4768();
return 0;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds



On Mon, 23 Oct 2000, Petr Vandrovec wrote:
> 
> Hi Linus,
>   unfortunately, this does not explain problem I reported. In my case,
> underlying fs is ext2, and there is no locking around - only one task
> does ftruncate() on (big) shareable mapped file (original code does it to
> prevent dirty pages to be written to disk after their contents is no
> longer relevant).

Yes. Yours is a separate problem, and is not due to invalidate_pages() at
all, but apparently due to truncate_complete_page().

Basically, all the same old arguments do apply - we cannot just remove the
page from the page cache if it is mapped.

Now, the truncate() case is different, because the code _tries_ to also
zap the actual mappings. The fact that it fails to do so is a bit
unnerving, actually. 

I'll take a better look at the truncate case (I consider the invalidate
case closed). Do you have a simple test-program around?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec

On 23 Oct 00 at 10:33, Linus Torvalds wrote:
> Trond Myklebust  <[EMAIL PROTECTED]> wrote:
> >
> >As for simply settling for a self-consistent mmap() rather than
> >tackling the problem of rereading; the main crime is that you're
> >rendering file locking unusable.
...
> This is not a crime.
> 
...
> And yes, I know that file locking can try to be nice in both of these
> cases.  I know that there are systems that try to revert shared mappings
> etc upon file locking (or not allow file locking at all if mappings are
> present).  Linux is not one of them, and never has been. Neither are
> most UNIXes out there.
> Linus

Hi Linus,
  unfortunately, this does not explain problem I reported. In my case,
underlying fs is ext2, and there is no locking around - only one task
does ftruncate() on (big) shareable mapped file (original code does it to
prevent dirty pages to be written to disk after their contents is no
longer relevant). And for some reason, not all instance of shared mapping 
are invalidated - some are still connected to underyling pages, but 
these pages have page->mapping = NULL... (and no, in testcase there
is no vmware around). So on task exit kernel dies with page->mapping
dereference in filemap_write_page (->filemap_sync->filemap_unmap->exit_mmap->
->mmput->do_exit)...
Thanks,
Petr Vandrovec
[EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Trond Myklebust  <[EMAIL PROTECTED]> wrote:
>
>As for simply settling for a self-consistent mmap() rather than
>tackling the problem of rereading; the main crime is that you're
>rendering file locking unusable.

This is not a crime.

Anybody who uses file locking over NFS is buggy and nobody sane should
expect it to work reliably. There are good reasons why mail agents etc
depend on other kinds of locking over NFS.

Furthermore, anybody who expects file locking to work _anywhere_ (ie
including local filesystems) in the presense of file mapping is just so
out to lunch that it's ridiculous. 

Neither of these issues have anything to do with Linux. They are just
statements of fact.

And yes, I know that file locking can try to be nice in both of these
cases.  I know that there are systems that try to revert shared mappings
etc upon file locking (or not allow file locking at all if mappings are
present).  Linux is not one of them, and never has been. Neither are
most UNIXes out there.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PC speaker driver patch for 2.4.0-test10-pre3

2000-10-23 Thread Oliver Xymoron

On Mon, 23 Oct 2000, David Woodhouse wrote:

> 
> [EMAIL PROTECTED] said:
> >  You can also pretty trivially keep track of an error term so that the
> > clock is right on average: 
> 
> True, but I don't want 'right on average'. I want 'not screwed with at all'.
> Shifting the timer tick onto the RTC will give me that. 
> 
> Even if we _do_ get the maths right, fudging with the frequency of the 
> system timer is still an ugly hack.

I personally think the system timer is already an ugly hack. HZ is
arbitrary, slow by modern standards, and introduces latency.
As the comment you quoted points out, it's also not very accurate.  
Much better would be an agenda structure with one shot timers between
events and jiffies based on cycle counters. This works on modern hardware
and scales well for higher processor speeds. 

As for the current inaccuracy, the jitter introduced by using an error
term can never be more than a single real interrupt cycle, which is
already well below the average userspace latency.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PC speaker driver patch for 2.4.0-test10-pre3

2000-10-23 Thread David Woodhouse


[EMAIL PROTECTED] said:
>  You can also pretty trivially keep track of an error term so that the
> clock is right on average: 

True, but I don't want 'right on average'. I want 'not screwed with at all'.
Shifting the timer tick onto the RTC will give me that. 

Even if we _do_ get the maths right, fudging with the frequency of the 
system timer is still an ugly hack.



--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PC speaker driver patch for 2.4.0-test10-pre3

2000-10-23 Thread Oliver Xymoron

On Mon, 23 Oct 2000, David Woodhouse wrote:

> See arch/i386/kernel/time.c:
> 
> /* This function must be called with interrupts disabled 
>  * It was inspired by Steve McCanne's microtime-i386 for BSD.  -- jrs
>  * 
>  * However, the pc-audio speaker driver changes the divisor so that
>  * it gets interrupted rather more often - it loads 64 into the
>  * counter rather than 11932! This has an adverse impact on
>  * do_gettimeoffset() -- it stops working! What is also not
>  * good is that the interval that our timer function gets called
>  * is no longer 10.0002 ms, but 9.9767 ms. To get around this
>  * would require using a different timing source.

You can also pretty trivially keep track of an error term so that the
clock is right on average:

#define CLK_HZ 1193200 /* or whatever */
#define ET_PRECISION 1000
#define ET_TIMER_TICKS CLK_HZ*ET_PRECISION/HZ
#define ET_PCSP_TICKS 64*ET_PRECISION
...
static long et=0;

...
et+=ET_PCSP_TICKS;
if(et>ET_TIMER_TICKS) {
et-=ET_TIMER_TICKS;
simulate_timer_tick();
}

This keeps track of the residual timing error to a few decimal places to
several decimal places and will get closer to HZ than even the current
code.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PC speaker driver patch for 2.4.0-test10-pre3

2000-10-23 Thread David Woodhouse


[EMAIL PROTECTED] said:
> Apparently Linus doesn't like the way it handles interrupts or
> something, and is therefore 'wrong.' ::shrug:: As long as it's
> available though, I'll use it ;p 

It's not just 'wrong'. It's 'sick'. But it works. The correct fix is to
shift the system timer onto the RTC, leaving the 8253 timer available for us
to abuse for the PC speaker.

See arch/i386/kernel/time.c:

/* This function must be called with interrupts disabled 
 * It was inspired by Steve McCanne's microtime-i386 for BSD.  -- jrs
 * 
 * However, the pc-audio speaker driver changes the divisor so that
 * it gets interrupted rather more often - it loads 64 into the
 * counter rather than 11932! This has an adverse impact on
 * do_gettimeoffset() -- it stops working! What is also not
 * good is that the interval that our timer function gets called
 * is no longer 10.0002 ms, but 9.9767 ms. To get around this
 * would require using a different timing source. Maybe someone
 * could use the RTC - I know that this can interrupt at frequencies
 * ranging from 8192Hz to 2Hz. If I had the energy, I'd somehow fix
 * it so that at startup, the timer code in sched.c would select
 * using either the RTC or the 8253 timer. The decision would be
 * based on whether there was any other device around that needed
 * to trample on the 8253. I'd set up the RTC to interrupt at 1024 Hz,
 * and then do some jiggery to have a version of do_timer that 
 * advanced the clock by 1/1024 s. Every time that reached over 1/100
 * of a second, then do all the old code. If the time was kept correct
 * then do_gettimeoffset could just return 0 - there is no low order
 * divider that can be accessed.
 *
 * Ideally, you would be able to use the RTC for the speaker driver,
 * but it appears that the speaker driver really needs interrupt more
 * often than every 120 us or so.
 *
 * Anyway, this needs more thought  pjsg (1993-08-28)
 * 
 * If you are really that interested, you should be reading
 * comp.protocols.time.ntp!
 */


--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PC speaker driver patch for 2.4.0-test10-pre3

2000-10-22 Thread Mike A. Harris

On Sun, 22 Oct 2000, Dr. Kelsey Hudson wrote:

>Date: Sun, 22 Oct 2000 13:39:09 -0700 (PDT)
>From: Dr. Kelsey Hudson <[EMAIL PROTECTED]>
>To: Mike A. Harris <[EMAIL PROTECTED]>
>Cc: David Woodhouse <[EMAIL PROTECTED]>,
> Linux Kernel mailing list <[EMAIL PROTECTED]>
>Content-Type: TEXT/PLAIN; charset=US-ASCII
>Subject: Re: PC speaker driver patch for 2.4.0-test10-pre3
>
>> Is there a major compelling reason that this patch isn't included
>> in the standard kernel tree?
>> 
>> 
>> --
>>   Mike A. Harris  -  Linux advocate  -  Open source advocate
>>   Computer Consultant - Capslock Consulting
>>  Copyright 2000 all rights reserved
>> --
>
>Apparently Linus doesn't like the way it handles interrupts or something,
>and is therefore 'wrong.' ::shrug:: As long as it's available though, I'll
>use it ;p

Someone has explained to me why it isn't included, and it made
decent sense to me.  Probably best left as a patch...


--
  Mike A. Harris  -  Linux advocate  -  Open source advocate
  Computer Consultant - Capslock Consulting
 Copyright 2000 all rights reserved
--

#[Mike A. Harris bash tip #1 - separate history files per virtual console]
# Put the following at the bottom of your ~/.bash_profile
[ ! -d ~/.bash_histdir ] && mkdir ~/.bash_histdir
tty |grep "^/dev/tty[0-9]" >& /dev/null && \
export HISTFILE=~/.bash_histdir/.$(tty | sed -e 's/.*\///')

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PC speaker driver patch for 2.4.0-test10-pre3

2000-10-22 Thread Dr. Kelsey Hudson

> Is there a major compelling reason that this patch isn't included
> in the standard kernel tree?
> 
> 
> --
>   Mike A. Harris  -  Linux advocate  -  Open source advocate
>   Computer Consultant - Capslock Consulting
>  Copyright 2000 all rights reserved
> --

Apparently Linus doesn't like the way it handles interrupts or something,
and is therefore 'wrong.' ::shrug:: As long as it's available though, I'll
use it ;p

 Kelsey Hudson   [EMAIL PROTECTED] 
 Software Engineer
 Compendium Technologies, Inc   (619) 725-0771
--- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



HD Benchmark 2.2.17(ide_patch)/2.4.0-test10-pre3 AMD v1.2 ide driver

2000-10-21 Thread bruj0

Greetings all:
I was doing some benchmark between kernels and i came out with
this results, for what I can see, 2.2.17 with the ide_patch is faster,
although it uses more CPU, but then again i could be wrong, any ideas?
-Rodrigo

2.4.0-test10-pre3 FS=ext2 AMD v1.2 (viper) IDE DRIVE

Version 1.00d   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
Ono-Sendai.lin 248M  6713  98 23372  17  7515   9  6814  94 16644  12  40.4 0
--Sequential Create-- RandomCreate
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 30   256  99 + +++  8282 100   258  99 + +++  1005 99
--
2.2.17-ide_patch-reiser_patch FS=ext2
Version 1.00d   
--Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
Ono-Sendai.lin 248M  6458  94 23786  19  6662  13  6941  95 17266  11  43.1   0
   --Sequential Create-- Random Create
   -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
   30   261  99   636  99  6466  99   259  99   801 100   373  57
--
lspci output:
00:07.1 IDE interface: Advanced Micro Devices [AMD]: Unknown device 7409 (rev 07)
128mb RAM
/dev/hda: (western digital 20gig udma(66))

 Model=WDC WD205AA, FwRev=05.05B05, SerialNo=WD-WMA0W1681543
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40
 BuffType=3(DualPortCache), BuffSize=2048kB, MaxMultSect=16, MultSect=off
 DblWordIO=no, maxPIO=2(fast), DMA=yes, maxDMA=0(slow)
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes
 LBA CHS=1023/256/63 Remapping, LBA=yes, LBAsects=40079088
 tDMA={min:120,rec:120}, DMA modes: mword0 mword1 mword2
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, PIO modes: mode3 mode4
 UDMA modes: mode0 mode1 mode2

-
Webmaster of http://www.securityportal.com.ar
[EMAIL PROTECTED] 
 /"\
 \ / ASCII Ribbon Campaign
  X  Against HTML Mail
 / \
  Proud member of http://www.undersec.com
-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[PATCH] Re: 2.4.0-test10-pre3:mkdir() returns -EIO when ext2 filesystem full

2000-10-21 Thread Andries Brouwer

On Sat, Oct 21, 2000 at 02:14:21PM +0100, Russell King wrote:

> Something weird is going on with VFS return codes with kernel
> 2.4.0-test10-pre3:
> 
> [root@sturm glibc-2.1.92]# df /var/tmp
> Filesystem   1k-blocks  Used Available Use% Mounted on
> /dev/hda3   127383127383 0 100% /var
> [root@sturm glibc-2.1.92]# mkdir /var/tmp/tst
> mkdir: cannot create directory `/var/tmp/tst': Input/output error
> 
> Just a guess, but I think this is caused by:
> 
> dir_block = ext2_bread (inode, 0, 1, &err);
> if (!dir_block) {
> inode->i_nlink--; /* is this nlink == 0? */
> mark_inode_dirty(inode);
> iput (inode);
> return -EIO;
> }
> 
> in fs/ext2/namei.c:ext2_mkdir().  Note that I can't find any cases where
> ext2_alloc_block() would actually give a ENOSPC return.
> 
> Note (again) the dodgy return error by reference, and then we totally
> ignore 'err', so even if ext2_alloc_block() was fixed, it wouldn't
> matter unless this and probably other places were also fixed to return
> err instead of EIO.

Ah, but I do find cases where we would get ENOSPC.
So, the following patch, anticipated by you, fixes this.

diff -u --recursive --new-file ../linux-2.4.0test9/linux/fs/ext2/namei.c 
./linux/fs/ext2/namei.c
--- ../linux-2.4.0test9/linux/fs/ext2/namei.c   Wed Sep 27 22:41:33 2000
+++ ./linux/fs/ext2/namei.c Sat Oct 21 22:59:59 2000
@@ -437,7 +437,7 @@
inode->i_nlink--; /* is this nlink == 0? */
mark_inode_dirty(inode);
iput (inode);
-   return -EIO;
+   return err;
}
de = (struct ext2_dir_entry_2 *) dir_block->b_data;
de->inode = cpu_to_le32(inode->i_ino);

[if one is very cautious,
return err ? err : -EIO
would be safest, but I believe that all paths leading here
in fact do set err]

Andries
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test10-pre3

2000-10-21 Thread Rik van Riel

On 16 Oct 2000, H. Peter Anvin wrote:
> Followup to:  <[EMAIL PROTECTED]>
> By author:[EMAIL PROTECTED]
> In newsgroup: linux.dev.kernel
> > 
> > > but intel refuse to guarantee they wont produce an actual '786' class
> > > CPU.
> > 
> > Worry about that if/when it happens ?
> 
> Dare one guess the 786 is actually the Itanic in x86 mode?

In that case I guess we won't have to worry about it
happening ;)

*runs like hell*

cheers,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-test10-pre3:mkdir() returns -EIO when ext2 filesystem full

2000-10-21 Thread Russell King

Hi,

Something weird is going on with VFS return codes with kernel
2.4.0-test10-pre3:

[root@sturm glibc-2.1.92]# df /var/tmp
Filesystem   1k-blocks  Used Available Use% Mounted on
/dev/hda3   127383127383 0 100% /var
[root@sturm glibc-2.1.92]# df -i /var/tmp
FilesystemInodes   IUsed   IFree IUse% Mounted on
/dev/hda3  329122936   299769% /var
[root@sturm glibc-2.1.92]# mkdir /var/tmp/tst
mkdir: cannot create directory `/var/tmp/tst': Input/output error
[root@sturm glibc-2.1.92]#

Just a guess, but I think this is caused by:

dir_block = ext2_bread (inode, 0, 1, &err);
if (!dir_block) {
inode->i_nlink--; /* is this nlink == 0? */
mark_inode_dirty(inode);
iput (inode);
return -EIO;
}

in fs/ext2/namei.c:ext2_mkdir().  Note that I can't find any cases where
ext2_alloc_block() would actually give a ENOSPC return.

Note (again) the dodgy return error by reference, and then we totally
ignore 'err', so even if ext2_alloc_block() was fixed, it wouldn't
matter unless this and probably other places were also fixed to return
err instead of EIO.
   _
  |_| - ---+---+-
  |   | Russell King[EMAIL PROTECTED]  --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
  | +-+-+ --- -+-
  /   |   THE developer of ARM Linux  |+| /|\
 /  | | | ---  |
+-+-+ -  /\\\  |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro



On Thu, 19 Oct 2000, Linus Torvalds wrote:

> You'd have to do something like
> 
>   LockPage(page); /* Nobody gets to write to this page (except 
>through mmaps, ugh) */
>   gather_all_mmap_users(page);/* THIS is the nasty one */

Wait a second. invalidate_inode_pages() has no idea of range, right? Finding
all VMAs of shared mappings for given inode is not a big deal. Sure,
repeating it for each bloody page would be painful at extreme, but
"make sure that every access to address within _that_ VMA will result in
pagefault" looks like a reasonable operation. Basically, we have
two sides - pagecache and many VMAs. And loop over VMAs with per-VMA
operations sounds more reasonable than loop over pages. We _can_ block
pageins without messing with pages themselves (it starts with finding VMA,
after all) and we can block new shared mappings (not a big deal).

Comments? Basically, I propose per-VMA rw-semaphore taken on page-in for
read and on that "flush and make sure we'll reread" operation for write +
rw-sem on i_mmap_shared. We could even make invalidation work for less
than whole file, all we need for that is skipping the VMAs out of the
range. Ingo, Linus?

>   nfs_wb_page(page);  /* force write-back on this page */
>   ClearPageUptodate(page);/* mark it not up-to-date to force a read-in 
>next time */
>   UnlockPage(page);   /* Ok, now the client can go wild */
> 
> where everything but the "gather_all_mmap_users()" part is fairly
> straightforward.  The "gather" phase is nasty - it would need to figure
> out every place the page is mapped, make sure those are synchronized (ie
> something like marking the page table entry write-protected and causing a
> TLB invalidate SMP cross-call - at which point the resulting page fault
> and the page lock will catch anybody who tries to write to the page)..
 
> In no case could you do something like what the current
> invalidate_inode_pages() does, which is to just try to drop the page from
> the cache - that really only works if we're the only user of that page,
> which the "page_count != 1" test now enforces.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust

> " " == Alexander Viro <[EMAIL PROTECTED]> writes:

 > Again, consider the case when two processes share the
 > mapping. Process A has page faulted in. Page is
 > invalidated. Process B tries to access the same page. If you
 > leave it in page tables of A you _MUST_ leave it in
 > cache. Period. Otherwise A and B will have different instances
 > of the page.

Even so, you want to reread it. When I said automatically msync(), I
meant 'schedule a write of what has changed and then do whatever you
need to do to get the page reread'. (You explicitly asked for
semantics, not an implementation)

 > And rereading the thing might be tolerable _only_ if there is
 > another client that had changed the file.  Even if you msync()
 > everything, you have to deal with plain and boring memory
 > modifications done by a process that did that bloody mmap(). If
 > they happen while you are reading the data from server - too
 > fscking bad, you'ld better have a good excuse for destroying
 > the data. write() from another client _is_ a good excuse. But
 > from my reading of fs/nfs/* it looks like we do that (cache
 > invalidation) left, right and center in cases that have nothing
 > to another clients.

That's because under NFS you don't have a cache consistency
protocol. Nothing tells you that the file/directory has changed and
that you have to resync your cache. Instead, you have to infer it from
the fact that some operation has returned file attributes that are
screwy.
In addition you may want to force a reread, because some operation
just changed a directory on the server, and you don't know what else
changed beforehand.

 > IOW, I think that invalidate_inode_pages() is bogus. There is
 > only one situation when we have a right to remove page from
 > pagecache - when it is not mapped anywhere.

The issue is not about removing pages. It's about forcing a reread of
the cached data from the server. Removing the actual pages from the
cache has so far been the only race-free method for doing this (since
pre-2.2.x at least) while ensuring that at least generic 'read',
'readdir' and 'write' work as expected.
Yes it screws up mmap() and should be fixed but without breaking what
little that works please.

As for simply settling for a self-consistent mmap() rather than
tackling the problem of rereading; the main crime is that you're
rendering file locking unusable.
Locking is the case in which you have to issue a guarantee that the
cache is consistent between client and server within the area covered
by the lock. In all other cases you *could* get away with the partial
cache invalidation implementation by arguing that there no consistency
guarantees inherent in the protocol.

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro



On Fri, 20 Oct 2000, Trond Myklebust wrote:

> For the general case of the page cache I think we can keep them quite
> simple:
> 
> + We do in any case want to drop all pages that are unreferenced. (The
> reason for flushing may be that the file size has changed.)
> 
> + For pages that are referenced (and unlocked) we would like to force
> them to get read in anew ASAP. How this is done in practice is
> irrelevant as far as NFS is concerned provided that we don't sleep on
> any I/O while in nfs_zap_caches()/invalidate_inode_pages().
> 
> The lower level stuff can and will sort out the business of flushing
> out pending writebacks that conflict with the read, so that isn't a
> problem for the VFS/VM.
> 
> The problem lies with writes that haven't yet been msync()ed (and
> hence do not have writebacks). For shared mappings, one should perhaps
> schedule an automatic msync() of the dirty pages (???). For private
> mappings, perhaps the best thing would be to defer the read?

Again, consider the case when two processes share the mapping. Process A
has page faulted in. Page is invalidated. Process B tries to access the
same page. If you leave it in page tables of A you _MUST_ leave it in
cache. Period. Otherwise A and B will have different instances of the
page.

It's not about writebacks. If you map something with MAP_SHARED and
fork() afterwards you _MUST_ have the same data at the address returned by
mmap() until one of the processes unmaps the thing.

And rereading the thing might be tolerable _only_ if there is another
client that had changed the file.  Even if you msync() everything, you
have to deal with plain and boring memory modifications done by a process
that did that bloody mmap(). If they happen while you are reading the data
from server - too fscking bad, you'ld better have a good excuse for
destroying the data. write() from another client _is_ a good
excuse. But from my reading of fs/nfs/* it looks like we do that (cache
invalidation) left, right and center in cases that have nothing to another
clients.

IOW, I think that invalidate_inode_pages() is bogus. There is only one
situation when we have a right to remove page from pagecache - when it is
not mapped anywhere.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro



On 20 Oct 2000, Trond Myklebust wrote:

> > " " == Russell King <[EMAIL PROTECTED]> writes:
> 
>  > invalidate_inode_pages nfs_zap_caches nfs_lock fcntl_setlk
>  > do_fcntl sys_fcntl
> 
>  > So I guess that NFS locking is really bad if the region is
>  > mmapped!
> 
> Yep, but that's a symptom, not a cause. We want to be able to run
> invalidate_inode_pages() safely at any moment, since the need can be
> triggered externally (because the server and client page caches
> disagree).

So what exactly do you want it to do when page is mapped by user process?
Should it remain visible or not? What should happen if process writes to
that page?

Trond, I'm not asking about implementation - the question being what
semantics do you want for nfs_zap_caches() wrt user-mapped pages.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro



On 20 Oct 2000, Trond Myklebust wrote:

> Under NFS the problem is that pages can (and *should*) be invalidated
> despite there being pending write backs. The server can trigger the
> need for a cache invalidation at any time.

OK, so what should happen if user does mmap() on NFS file, dirties the
page and server tells that page should be invalidated? Should we still see
that page in process' address space? Silently making it anonymous is the
worst thing possible - it's still accessible through process page tables
and you (a) still see the (allegedly) invalidated data and (b) have no way
in hell to get the cache coherency between processes. I.e. you are in for
situations when we do mmap(), fork(), write data in child and attempt to
read from that address in parent and child yield different
results. Moreover, future attempts to modify the data by either of them
will be invisible in another.

IOW, what do you really want from the invalidation? It looks like the
right thing _might_ be "remove from all page tables", but that can become
really interesting with mlock() and friends.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro



On Fri, 20 Oct 2000, Roger Larsson wrote:

> Is it legal/good practice to unmap the file after closing it?
Yes.

> (Since the sharing needs the fd to mmap it)

It doesn't. Mapping needs struct file * and it doesn't care about
fd. mmap() takes a reference to struct file by fd you've passed and after
that we can forget about descriptors - vma_struct holds a reference to
file and that's it.

> Successful unlinking a file should probably free pages directly to
> free list - might be worth optimizing for.

_Definitely_ no. Hell, unlink() doesn't mean that anything happens with
data - unlink an opened file and if that affects the read/write/lseek
you've found a bug. And mapping is equivalent to having opened descriptors
in that respect.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro



On Fri, 20 Oct 2000, Trond Myklebust wrote:

> > " " == David S Miller <[EMAIL PROTECTED]> writes:
> 
>  > Actually, judging by the trace you provided Russell, I'd say
>  > this is some peculiarity with NFS silly rename handling, and
>  > it'd be best to look for the bug in that code (early inode
>  > reference loss, for example?)
> 
> Russel's trace indicates that the unlink actually has completed and
> has become a negative dentry since the file is labelled '(deleted)'.

No, it doesn't. Proof:

exec 3>/tmp/foo
rm /tmp/foo
ls -l /proc/$$/fd/3

and dentry is _not_ negative.

> That means that the dentry count must have been zero, so that
> dentry_iput() was called.

It doesn't and it wasn't.

> I don't see how dentry_iput() can be called on an open file. In
> principle the dentry count should always be >= 1, so unless there is
> some place where we're calling d_delete() without get()ing the dentry
> first, there should be no path for early inode loss.

d_delete() will not make dentry negative if there are other
users. However, it will make it unhashed in such case, ergo
(deleted) thing.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds


[ Final comment, and then I'll shut up ]

On Thu, 19 Oct 2000, Linus Torvalds wrote:
> 
> You'd have to do something like
> 
>   LockPage(page); /* Nobody gets to write to this page (except 
>through mmaps, ugh) */
>   gather_all_mmap_users(page);/* THIS is the nasty one */
>   nfs_wb_page(page);  /* force write-back on this page */
>   ClearPageUptodate(page);/* mark it not up-to-date to force a read-in 
>next time */
>   UnlockPage(page);   /* Ok, now the client can go wild */

Note that one approach would be to just make invalidate_inode_pages() do
exactly the above.

Now, the gather_all_mmap_users() part is definitely a post-2.4.x thing,
and we can't do nfs_wb_page() in invalidate_inode_page() either, but what
we CAN do is to clear the Uptodate flag (and we do need the page lock to
do so).

The advantage of clearing the uptodate flag (as opposed to doing what we
do now - dropping the page altogether) is that there would be no cache
aliasing issues, and there would be no issues with a page and its
associated data just "disappearing" from under somebody. It would cause
the page to be read in again the next time it is faulted in or somebody
does a read() on it, and that's exactly what we want.

However, we _do_ need to WB the page data some way - but the decision on
whether to invalidate write-backs or finish them would have to be up to
the low-level filesystem.

(We should _not_ allow people to read in the page, and leave some stale
write-back information there, so that we'd write back stuff that we just
read because somebody else noticed that the page was not up-to-date. That
would be just an endless source of (a) confusion and (b) unnecessary
network traffic.)

We could, of course, have some combination of the two: if there is only
one user (us), we can just drop the page, otherwise we can mark it
non-up-to-date to force future readers to re-validate the dang thing.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds



On Thu, 19 Oct 2000, Linus Torvalds wrote:
> 
> I'm saying that we're much much better off guaranteeing local consistency
> over knowingly breaking local consistency over a uncertain global
> consistency issue. Especially as NFS has never guaranteed global
> consistency in the first place, and was not designed to do that.

Btw, if you really worry about NFS v3 and want to add consistency
guarantees to NFS v3, you can probably do that. It's just that you cannot
do it with a simple cache invalidation - you'd need to be much much more
careful (the same way a simple CPU cache invalidate does not guarantee SMP
cache consistency - it only guarantees data corruption due to missed
writes).

I don't know what the best way to do a true cache consistency protocol on
a page leve would be, especially as you will have a _really_ hard time
getting hold of an exclusive pointer to a page with multiple mmap's, but
you could probably get something that guarantees cache consistency as long
as people do _not_ expect mmap to always be 100% consistent.

(The reason you cannot get mmap consistency is just that mmap doesn't have
any kernel synchronization point unless you're willing to shoot down every
single mapping - which is expensive as hell, but doable).

You'd have to do something like

LockPage(page); /* Nobody gets to write to this page (except 
through mmaps, ugh) */
gather_all_mmap_users(page);/* THIS is the nasty one */
nfs_wb_page(page);  /* force write-back on this page */
ClearPageUptodate(page);/* mark it not up-to-date to force a read-in 
next time */
UnlockPage(page);   /* Ok, now the client can go wild */

where everything but the "gather_all_mmap_users()" part is fairly
straightforward.  The "gather" phase is nasty - it would need to figure
out every place the page is mapped, make sure those are synchronized (ie
something like marking the page table entry write-protected and causing a
TLB invalidate SMP cross-call - at which point the resulting page fault
and the page lock will catch anybody who tries to write to the page)..

In no case could you do something like what the current
invalidate_inode_pages() does, which is to just try to drop the page from
the cache - that really only works if we're the only user of that page,
which the "page_count != 1" test now enforces.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust

> " " == Linus Torvalds <[EMAIL PROTECTED]> writes:

 > which is really really bad, because now you have the case that
 > you have 'n' copies of the same page in memory, with 'n' users,
 > out of which 'n-1' users have the wrong page. And those 'n-1'
 > users don't even have any way of _knowing_ that they have the
 > wrong page.

 > Which is why we MUST NOT drop a page that has users. Really.

 > I'm telling you that cases #4 and #5 are _much_ worse than your
 > "solution" to case #2. And you argue that your solution is good
 > only because you're completely ignoring cases #4 and #5.

No. I'm arguing (at 4:40am and while trying to keep one eye on our
detector's data acquisition) on the basis that whoever holds the file
lock has to have a guarantee of obtaining 100% accuracy on the locked
region.

I agree that dropping pages is ugly and that it will always give
problems with shared mmap(), so if it can be shown that clearing
PG_uptodate and rereading the same page will give the required
guarantee on locking, then I'm not going to complain.

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds



On 20 Oct 2000, Trond Myklebust wrote:
> 
> The problem here is that NFS pages have 3 rather than 2 states:
>   1) mmapped & correct.
>   2) mmapped & incorrect. (but possibly dirty)
>   3) Unmapped
> 
> For case 1), we clearly want to have the page in inode->i_mapping.
> For cases 2) & 3) we don't.

I think you're WRONG, WRONG, WRONG.

Your case #2 is not at all a state.

It's "mapped, possibly dirty, and locally correct. But _maybe_ globally
incorrect".

If you remove the page in case #2 from the hashing, you will have case #4,
which you're ignoring:

   4) Totally, and utterly incorrect. Without any way for the application
  to even _know_ that it's incorrect.

Note that case 4 is accompanied by case #5, later on:

   5) two separate pages both mapped, one hashed and up-to-date, the other
  mapped in other processes but incorrect.

which is really really bad, because now you have the case that you have
'n' copies of the same page in memory, with 'n' users, out of which 'n-1'
users have the wrong page. And those 'n-1' users don't even have any way
of _knowing_ that they have the wrong page.

Which is why we MUST NOT drop a page that has users. Really.

I'm telling you that cases #4 and #5 are _much_ worse than your "solution"
to case #2. And you argue that your solution is good only because you're
completely ignoring cases #4 and #5.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds



On 20 Oct 2000, Trond Myklebust wrote:
> 
>  > invalidate_inode_pages nfs_zap_caches nfs_lock fcntl_setlk
>  > do_fcntl sys_fcntl
> 
>  > So I guess that NFS locking is really bad if the region is
>  > mmapped!
> 
> Yep, but that's a symptom, not a cause. We want to be able to run
> invalidate_inode_pages() safely at any moment, since the need can be
> triggered externally (because the server and client page caches
> disagree).

Well, the thing is, that if somebody has a page mapped, there's nothing we
can do if the server and client disagrees. 

We just cannot invalidate the page - people would actually lose data, and
would lose major local consistency guarantees that UNIX filesystems are
supposed to get. 

So it's a matter of one sort of consistency against another - and I think
the local consistency requirements are the stricter ones, considering that
we don't even _know_ whether the server has really changed those pages or
not..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds



On 20 Oct 2000, Trond Myklebust wrote:
> 
> Under NFS the problem is that pages can (and *should*) be invalidated
> despite there being pending write backs. The server can trigger the
> need for a cache invalidation at any time.
> The existence of file locks that aren't page aligned, as well as
> partial page writeback ensure that we cannot make the equivalence
>page has pending write == page is correct.

Note that this is not what my patch really says at all.

My patch says "ok, we should invalidate this page, because somebody thinks
it may be bad. HOWEVER, we cannot reasonably do that, because we have
users of the page (which may be completely unrelated to pending writes),
and invalidating this page is _certain_ to cause corruption".

Basically think of the case of somebody having a shared mmap over NFS. He
has dirty data in his pages, and he expects those to be written out to the
server. There is NO QUESTION about this fact.

Imagine somebody (possibly even the same person) then releasing or getting
a file lock, possibly in some other part of the file. 

Without that added test, all those dirty local pages just got completely
thrown away. 

Now, you have a choice: you can KNOW that you're doing something horribly
and utterly wrong (throwing away data), or you can SUSPECT that you might
cause non-coherency between the client and the server. You cannot get
both.

I'm saying that we're much much better off guaranteeing local consistency
over knowingly breaking local consistency over a uncertain global
consistency issue. Especially as NFS has never guaranteed global
consistency in the first place, and was not designed to do that.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust

> " " == Linus Torvalds <[EMAIL PROTECTED]> writes:


 > The advantage of clearing the uptodate flag (as opposed to
 > doing what we do now - dropping the page altogether) is that
 > there would be no cache aliasing issues, and there would be no
 > issues with a page and its associated data just "disappearing"
 > from under somebody. It would cause the page to be read in
 > again the next time it is faulted in or somebody does a read()
 > on it, and that's exactly what we want.

I've proposed this in the past, but there you refused on the grounds
that it breaks assumptions in the VFS. I'm otherwise happy with this
sort of thing. It ensures both shared mmap() and cache consistency.

The only problem I can think of would be for dirty pages that haven't
yet been msync()ed, however I'm far from being competent to evaluate
any side-effects on the VM.

 > However, we _do_ need to WB the page data some way - but the
 > decision on whether to invalidate write-backs or finish them
 > would have to be up to the low-level filesystem.

This is already in place.

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust

> " " == Linus Torvalds <[EMAIL PROTECTED]> writes:

 > Btw, that "invalidate_inode_pages()" thing is just wrong - we
 > can't just remove pages that are mapped etc, because that would
 > result in no end of fun aliasing problems etc.


 > How about adding a test in invalidate_inode_pages() like

 >  /* We cannot invalidate a locked page */ if
 >  (TryLockPage(page))
 >  continue;

 > + /* We cannot invalidate a page that is in use */
 > + if (page_count(page) != 1) {
 > + UnlockPage(page);
 > + continue;
 > + }
 > +
 >  __lru_cache_del(page); __remove_inode_page(page);

The problem here is that NFS pages have 3 rather than 2 states:
  1) mmapped & correct.
  2) mmapped & incorrect. (but possibly dirty)
  3) Unmapped

For case 1), we clearly want to have the page in inode->i_mapping.
For cases 2) & 3) we don't.

However for case 2) we still have a weak association to the inode
itself, and we want to be able to reference inode metadata etc.  Would
it make sense then to remove these pages from i_mapping, but to hang
them onto a new struct address_space (call it i_unmapped for want of a
better name)?

That would allow you to keep a consistent state for the page, while
still allowing you to 'invalidate' it (by removing it from the
i_mapping) and hence maintain a consistent cache.

invalidate_inode_pages() would then reduce to

   remove_page_from_inode_queue(page);
   remove_page_from_hash_queue(page);
   if (page_count(page))
 add_page_to_inode_unmapped(page);

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Russell King

Alexander Viro writes:
> Trond, I'm not asking about implementation - the question being what
> semantics do you want for nfs_zap_caches() wrt user-mapped pages.

Ok, looking through sendmail, and then db2, the situation is
created by the db2 library.  If the process does the following:

1. creates NFS file
2. sets file size by lseek + write
3. maps it
4. fcntl locks the file (writes above data back to file)
5. write to the mapping (makes pages dirty)
6. fcntl unlocks it (dirty page data is NOT written back)
7. closes it
8. unlinks it
9. some time later unmaps it(causes dirty data to be written back)

Note that (6) doesn't act as a barrier to synchronise writes in this case,
but it does for any normal write()s.  Surely NFS should cause any dirty
data associated with the file to be written back to the server no matter
what?

Although Linus' fix seems to prevent the problem, I get the feeling that it
is a sticky plaster over a much bigger problem.
   _
  |_| - ---+---+-
  |   | Russell King[EMAIL PROTECTED]  --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
  | +-+-+ --- -+-
  /   |   THE developer of ARM Linux  |+| /|\
 /  | | | ---  |
+-+-+ -  /\\\  |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Petr Vandrovec

On 19 Oct 00 at 16:32, Linus Torvalds wrote:

> How about adding a test in invalidate_inode_pages() like
> 
> /* We cannot invalidate a locked page */
> if (TryLockPage(page))
> continue;
> 
> +   /* We cannot invalidate a page that is in use */
> +   if (page_count(page) != 1) {
> +   UnlockPage(page);
> +   continue;
> +   }
> +
> __lru_cache_del(page);
> __remove_inode_page(page);
 
Hi Linus,
  this did not fix problem with my testcase - 
mmap(shared),fork,dirty_it,ftruncate,finish - does NOT go through
this code (invalidate_inode_pages) at all (should it?).

  I instrumented both swapout in page_io, and __remove_inode_page,
and pages are going through __remove_inode_page with strange use
count. I do not know whether it is good or no... During normal life
of system, pages with count==2 && index==0 are passed to __remove_inode_page.
As soon as I start my test, non-zero index pages with count > 2 are
passed here. Later this pages appear in filemap_sync...
Thanks,
Petr Vandrovec
[EMAIL PROTECTED]

vana:~# free
 total   used   free sharedbuffers cached
Mem:255768 203232  52536  0   4084  35496
-/+ buffers/cache: 163652  92116
Swap:   530136  0 530136
vana:~# free
 total   used   free sharedbuffers cached
Mem:255768 203376  52392  0   4084  35636
-/+ buffers/cache: 163656  92112
Swap:   530136  0 530136

vana:~# free
 total   used   free sharedbuffers cached
Mem:255768  72172 183596  0   4084  32888
-/+ buffers/cache:  35200 220568
Swap:   530136  0 530136
vana:~# free
 total   used   free sharedbuffers cached
Mem:255768  72180 183588  0   4084  32888
-/+ buffers/cache:  35208 220560
Swap:   530136  0 530136


11:57:44 Bad page count in __remove_inode_page!
11:57:44   page: c1349ef8
11:57:44 mapping:  cc65c6dc
11:57:44 index:0
11:57:44 nexthash: 
11:57:44 count:2
11:57:44 flags:0x0009
11:57:44 age:  0
11:57:44 pprevhash: 
11:57:44 buffers:  
11:57:44 virtual:  cc61a000
11:57:44 zone: c021f578
11:57:44 pagedump done
11:57:44 Bad page count in __remove_inode_page!
11:57:44   page: c1120394
11:57:44 mapping:  cc6b2edc
11:57:44 index:32767
11:57:44 nexthash: 
11:57:44 count:3
11:57:44 flags:0x0009
11:57:44 age:  5
11:57:44 pprevhash: 
11:57:44 buffers:  
11:57:44 virtual:  c43d1000
11:57:44 zone: c021f578
11:57:44 pagedump done

11:57:46 Bad page count in __remove_inode_page!
11:57:46   page: c1266298
11:57:46 mapping:  cc6b2edc
11:57:46 index:13635
11:57:46 nexthash: 
11:57:46 count:3
11:57:46 flags:0x0009
11:57:46 age:  5
11:57:46 pprevhash: 
11:57:46 buffers:  
11:57:46 virtual:  c9082000
11:57:46 zone: c021f578
11:57:46 pagedump done

11:57:46 page->mapping == NULL
11:57:46 error:  -22
11:57:46 ptep:   c92f6a2c
11:57:46 pteval: 09062027
11:57:46 vma:ce926420
11:57:46   vm_mm:ccd67d60
11:57:46   vm_start: 40128000
11:57:46   vm_end:   48128000
11:57:46   vm_next:  c1490b20
11:57:46   vm_avl_height:  1
11:57:46   vm_avl_left:
11:57:46   vm_avl_right:   
11:57:46   vm_next_share:  
11:57:46   vm_pprev_share: ccdee684
11:57:46   vm_operations_struct: c021f1a0
11:57:46   vm_pgoff:  
11:57:46   vm_file:   cd4270e0
11:57:46   vm_raend:  
11:57:46   vm_private_data:  
11:57:46 address:4368B000
11:57:46 flags:  0001
11:57:46   file: cd4270e0
11:57:46 dentry:   cc8ae440
11:57:46   inode: cc6b2e40
11:57:46   num: 848677
11:57:46   dev: 0x0302
11:57:46   path: /usr/src/tst/ram0 (deleted)
11:57:46 vfsmount: c14ca8c0
11:57:46 op:   c0221bc0
11:57:46 count:4
11:57:46 flags:0x0002
11:57:46 mode: 03
11:57:46 pos:  0
11:57:46 reada:0
11:57:46 ramax:0
11:57:46 raend:0
11:57:46 ralen:0
11:57:46 rawin:0
11:57:46 owner.pid: 0
11:57:46 owner.uid: 0
11:57:46 owner.euid: 0
11:57:46 owner.signum: 0
11:57:46 uid:  0
11:57:46 gid:  0
11:57:46 error:0
11:57:46 version:  11641
11:57:46 private_data: 
11:57:46   page: c1265a18
11:57:46 mapping:  
11:57:46 index:   

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust

> " " == Alexander Viro <[EMAIL PROTECTED]> writes:

 > So what exactly do you want it to do when page is mapped by
 > user process?  Should it remain visible or not? What should
 > happen if process writes to that page?

 > Trond, I'm not asking about implementation - the question being
 > what semantics do you want for nfs_zap_caches() wrt user-mapped
 > pages.

For the general case of the page cache I think we can keep them quite
simple:

+ We do in any case want to drop all pages that are unreferenced. (The
reason for flushing may be that the file size has changed.)

+ For pages that are referenced (and unlocked) we would like to force
them to get read in anew ASAP. How this is done in practice is
irrelevant as far as NFS is concerned provided that we don't sleep on
any I/O while in nfs_zap_caches()/invalidate_inode_pages().

The lower level stuff can and will sort out the business of flushing
out pending writebacks that conflict with the read, so that isn't a
problem for the VFS/VM.

The problem lies with writes that haven't yet been msync()ed (and
hence do not have writebacks). For shared mappings, one should perhaps
schedule an automatic msync() of the dirty pages (???). For private
mappings, perhaps the best thing would be to defer the read?

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds



On Fri, 20 Oct 2000, Trond Myklebust wrote:
> 
> The problem lies with writes that haven't yet been msync()ed (and
> hence do not have writebacks). For shared mappings, one should perhaps
> schedule an automatic msync() of the dirty pages (???). For private
> mappings, perhaps the best thing would be to defer the read?

Note that NONE of this is going to happen for 2.4.x.

We've never _ever_ done this before, there's no point in even suggesting
that this is suddenly a "critical" bug. It's not.

I want to know what the suggestion for 2.4.x is. Right now that's the "if
the count is elevated, we don't invalidate". 

Quite frankly, I don't see any other option. Doing the !Uptodate version
will lose local data as it stands now - in fact right now you'd lose data
that way even if you are the only client accessing the file, which is
obviously complete crap and _completely_ unacceptable.

I'm open to suggestions, but I haven't heard anything realistic.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust

> " " == Linus Torvalds <[EMAIL PROTECTED]> writes:

 > How about adding a test in invalidate_inode_pages() like

 >  /* We cannot invalidate a locked page */ if
 >  (TryLockPage(page))
 >  continue;

 > + /* We cannot invalidate a page that is in use */
 > + if (page_count(page) != 1) {
 > + UnlockPage(page);
 > + continue;
 > + }
 > +
 >  __lru_cache_del(page); __remove_inode_page(page);

 > Because otherwise we might end up invalidating pages that might
 > have pending write-back etc (although I think the NFS logic
 > tries to avoid invalidating when there's pending activity).

You may remember that I've been fighting a rearguard action against
this for some time.

Under NFS the problem is that pages can (and *should*) be invalidated
despite there being pending write backs. The server can trigger the
need for a cache invalidation at any time.
The existence of file locks that aren't page aligned, as well as
partial page writeback ensure that we cannot make the equivalence
   page has pending write == page is correct.

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust

> " " == Russell King <[EMAIL PROTECTED]> writes:

 > invalidate_inode_pages nfs_zap_caches nfs_lock fcntl_setlk
 > do_fcntl sys_fcntl

 > So I guess that NFS locking is really bad if the region is
 > mmapped!

Yep, but that's a symptom, not a cause. We want to be able to run
invalidate_inode_pages() safely at any moment, since the need can be
triggered externally (because the server and client page caches
disagree).

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust

> " " == Russell King <[EMAIL PROTECTED]> writes:

 > Indeed.  page->mapping is set to NULL in two places, one in
 > __remove_inode_pages() and the other one in the swap code after
 > we've checked that it was NULL.  I hadn't found the particular
 > call trace that caused us to ended up in __remove_inode_page()
 > though with this page.

Have you tried tagging invalidate_inode_pages() with a debugging
statement?

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust

> " " == David S Miller <[EMAIL PROTECTED]> writes:

 > Actually, judging by the trace you provided Russell, I'd say
 > this is some peculiarity with NFS silly rename handling, and
 > it'd be best to look for the bug in that code (early inode
 > reference loss, for example?)

Russel's trace indicates that the unlink actually has completed and
has become a negative dentry since the file is labelled '(deleted)'.
That means that the dentry count must have been zero, so that
dentry_iput() was called.

I don't see how dentry_iput() can be called on an open file. In
principle the dentry count should always be >= 1, so unless there is
some place where we're calling d_delete() without get()ing the dentry
first, there should be no path for early inode loss.

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Russell King

Linus Torvalds writes:
> Can you get a full stack trace?

filemap_write_page
filemap_sync
filemap_unmap
do_munmap
sys_munmap

> How about adding a test in invalidate_inode_pages() like

(Added, along with a call to drop a stack trace out).

Yes, this does stop the problem in filemap_write_page.  The stack trace
here shows:

invalidate_inode_pages
nfs_zap_caches
nfs_lock
fcntl_setlk
do_fcntl
sys_fcntl

So I guess that NFS locking is really bad if the region is mmapped!
   _
  |_| - ---+---+-
  |   | Russell King[EMAIL PROTECTED]  --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
  | +-+-+ --- -+-
  /   |   THE developer of ARM Linux  |+| /|\
 /  | | | ---  |
+-+-+ -  /\\\  |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Linus Torvalds

In article <8snvuj$1l0$[EMAIL PROTECTED]>,
Linus Torvalds <[EMAIL PROTECTED]> wrote:
>
>Hmm.. Looks like page->mapping was cleared by truncate_inode_pages()
>when the inode was free'd, and there was still write-back activity on
>one of the pages in question. 

Looking some more, the fact that the oops happens in filemap_sync_pte()
definitely means that we haven't gotten around to truncating the sucker
yet. So invalidate_inode_pages() looks like the more likely cause,
especially as the page writeback correctly increases the file count for
NFS, so we should never be able to get into the horror-schenario of
having write-back after the pages have been truncated due to the inode
getting cleared.

Does the simple added test for page counts to invalidate_inode_pages()
fix the problem for you, Russell?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Trond Myklebust  <[EMAIL PROTECTED]> wrote:
>> " " == Petr Vandrovec <[EMAIL PROTECTED]> writes:
>
> > You do not have to use NFS - look for my postings with
> > 'page->mapping == NULL' in archive. Your code uses shared mmap,
> > 'page->isn't
> > it? Probably shared by couple of processes.
>
>It's probably particularly nasty under NFS because of
>invalidate_inode_pages().

Btw, that "invalidate_inode_pages()" thing is just wrong - we can't just
remove pages that are mapped etc, because that would result in no end of
fun aliasing problems etc.

I don't see why invalidate_inode_pages() would be called for this
sequence, though, which is why I'd be more likely to blame the truncate
case, but I think you're right about invalidate_inode_pages() being a
problem none-the-less.

How about adding a test in invalidate_inode_pages() like

/* We cannot invalidate a locked page */
if (TryLockPage(page))
continue;

+   /* We cannot invalidate a page that is in use */
+   if (page_count(page) != 1) {
+   UnlockPage(page);
+   continue;
+   }
+
__lru_cache_del(page);
__remove_inode_page(page);

Because otherwise we might end up invalidating pages that might have
pending write-back etc (although I think the NFS logic tries to avoid
invalidating when there's pending activity).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Russell King  <[EMAIL PROTECTED]> wrote:
>Petr Vandrovec writes:
>> ... or from sys_exit() if you forget to unmap. Or from anywhere if
>> swapping code decides to swap such page. I'm trying to hunt it down
>> for more than month, but I have no idea what's wrong. In my case
>> way to trigger bug is:
>
>I actually think its as simple as:
>
>1. shared map a file
>2. close file
>3. unlink file
>4. unmap shared mapping
>
>In my case, I was running newaliases, and there wasn't any other
>processes using that mapping.

Hmm.. Looks like page->mapping was cleared by truncate_inode_pages()
when the inode was free'd, and there was still write-back activity on
one of the pages in question. 

Looking at the problem, it appears that the unmap() will (correctly) do
a writepage() for each dirty page, and that will (again correctly) cause
NFS to do a delayed write-out with all the proper things (increment page
count because of the pending write etc). The inode hasn't been free'd
yet, because we're still using it.

But then when we _do_ release the inode, we end up doing
truncate_inode_pages() and through that we clear page->mapping, so the
pending write-out is SOL.  I have this feeling that we probably should
NOT remove the mapping until the page count has gone down to zero. 

The other alternative would be to flush the pending write-outs (or wait
for them synchronously - but that's just stupid and inefficient), which
would imply that we'd better add the "flush" operation to "struct
address_space_operations". 

Can you get a full stack trace?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread David S. Miller

   From: Russell King <[EMAIL PROTECTED]>
   Date:Fri, 20 Oct 2000 00:07:55 +0100 (BST)

   Trond Myklebust writes:
   > It's probably particularly nasty under NFS because of
   > invalidate_inode_pages(). The latter empties the page cache whenever
   > we can no longer trust it and calls remove_inode_page() on every
   > unlocked page. It won't care whether the page is mmapped or not.
   > 
   > My guess is therefore that the line setting 'page->mapping = NULL' in
   > __remove_inode_page() is a candidate for scrutiny...

   Indeed.  page->mapping is set to NULL in two places, one in
   __remove_inode_pages() and the other one in the swap code after we've
   checked that it was NULL.  I hadn't found the particular call trace
   that caused us to ended up in __remove_inode_page() though with this
   page.

Actually, judging by the trace you provided Russell, I'd say this is
some peculiarity with NFS silly rename handling, and it'd be best to
look for the bug in that code (early inode reference loss, for
example?)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Russell King

Trond Myklebust writes:
> It's probably particularly nasty under NFS because of
> invalidate_inode_pages(). The latter empties the page cache whenever
> we can no longer trust it and calls remove_inode_page() on every
> unlocked page. It won't care whether the page is mmapped or not.
> 
> My guess is therefore that the line setting 'page->mapping = NULL' in
> __remove_inode_page() is a candidate for scrutiny...

Indeed.  page->mapping is set to NULL in two places, one in
__remove_inode_pages() and the other one in the swap code after we've
checked that it was NULL.  I hadn't found the particular call trace
that caused us to ended up in __remove_inode_page() though with this
page.
   _
  |_| - ---+---+-
  |   | Russell King[EMAIL PROTECTED]  --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
  | +-+-+ --- -+-
  /   |   THE developer of ARM Linux  |+| /|\
 /  | | | ---  |
+-+-+ -  /\\\  |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust

> " " == Petr Vandrovec <[EMAIL PROTECTED]> writes:

 > You do not have to use NFS - look for my postings with
 > 'page->mapping == NULL' in archive. Your code uses shared mmap,
 > 'page->isn't
 > it? Probably shared by couple of processes.

It's probably particularly nasty under NFS because of
invalidate_inode_pages(). The latter empties the page cache whenever
we can no longer trust it and calls remove_inode_page() on every
unlocked page. It won't care whether the page is mmapped or not.

My guess is therefore that the line setting 'page->mapping = NULL' in
__remove_inode_page() is a candidate for scrutiny...

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Russell King

Roger Larsson writes:
> Will it work correctly if 4. is done before 3. (even before 2?)
> Is it legal/good practice to unmap the file after closing it?
> (Since the sharing needs the fd to mmap it)

Dunno, haven't tried that yet.  I'll have a go tomorrow, but I think
it'll work correctly.  I'll also dig about in the VFS / NFS code some
more to see if I can find anything obvious.

> I would have expected this order:
>   A. open file
>   B. shared map file
>   C. unmap shared mapping
>   D. close file
>   E. unlink file

BTW, the program which is giving grief is good ol' sendmail, v8.11.
It does the unmap after close all over the place.
   _
  |_| - ---+---+-
  |   | Russell King[EMAIL PROTECTED]  --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
  | +-+-+ --- -+-
  /   |   THE developer of ARM Linux  |+| /|\
 /  | | | ---  |
+-+-+ -  /\\\  |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Roger Larsson

Russell King wrote:
> 
> Petr Vandrovec writes:
> > ... or from sys_exit() if you forget to unmap. Or from anywhere if
> > swapping code decides to swap such page. I'm trying to hunt it down
> > for more than month, but I have no idea what's wrong. In my case
> > way to trigger bug is:
> 
> I actually think its as simple as:
> 
  0. open file
> 1. shared map file
> 2. close file
> 3. unlink file
> 4. unmap shared mapping

Will it work correctly if 4. is done before 3. (even before 2?)
Is it legal/good practice to unmap the file after closing it?
(Since the sharing needs the fd to mmap it)

I would have expected this order:
  A. open file
  B. shared map file
  C. unmap shared mapping
  D. close file
  E. unlink file

But in your case the unlink should be deferred to unmap time...
(if it is legal)

Successful unlinking a file should probably free pages directly to
free list - might be worth optimizing for.

/RogerL


Disclaimer:
 I do not really understand fs code...

> 
> In my case, I was running newaliases, and there wasn't any other
> processes using that mapping.
>_
>   |_| - ---+---+-
>   |   | Russell King[EMAIL PROTECTED]  --- ---
>   | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
>   | +-+-+ --- -+-
>   /   |   THE developer of ARM Linux  |+| /|\
>  /  | | | ---  |
> +-+-+ -  /\\\  |
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Russell King

Petr Vandrovec writes:
> ... or from sys_exit() if you forget to unmap. Or from anywhere if
> swapping code decides to swap such page. I'm trying to hunt it down
> for more than month, but I have no idea what's wrong. In my case
> way to trigger bug is:

I actually think its as simple as:

1. shared map a file
2. close file
3. unlink file
4. unmap shared mapping

In my case, I was running newaliases, and there wasn't any other
processes using that mapping.
   _
  |_| - ---+---+-
  |   | Russell King[EMAIL PROTECTED]  --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
  | +-+-+ --- -+-
  /   |   THE developer of ARM Linux  |+| /|\
 /  | | | ---  |
+-+-+ -  /\\\  |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Petr Vandrovec

On 19 Oct 00 at 22:16, Russell King wrote:

> I'm seeing an oops caused by a NULL pointer dereference in mm/filemap.c,
> filemap_write_page.  The NULL pointer in question is page->mapping.
> The box on which this is happening is using a root NFS filesystem (in
> fact all but one of its filesystems are NFS).

You do not have to use NFS - look for my postings with 
'page->mapping == NULL' in archive. Your code uses shared mmap, isn't
it? Probably shared by couple of processes.

> So the file that was mapped has been deleted...  Ok, 
> With the following in filemap_sync_pte:
> 
> this reveals:
> 
> --> page c017e118 mapping NULL - address 40264000
> --> vma->vm_file = c0c8b120
> --> /var/tmp/.nfs16130002 (deleted)
> 
> And the stack trace indicates that the kernel execution path came from
> sys_munmap.

... or from sys_exit() if you forget to unmap. Or from anywhere if
swapping code decides to swap such page. I'm trying to hunt it down
for more than month, but I have no idea what's wrong. In my case
way to trigger bug is:

(1) shared map some page
(2) unlink underlying file
(3) fork
(4)   from child dirty pages
(5)   from parent truncate file
(6) exit both tasks
(7) oops

Just in case you'll not find oops demo code in archive... Last tested 
under 2.4.0-test10-pre3. With -pre4 I cannot decide whether to comment 
out BUG() in vmscan, or not, so I did not test -pre4 yet... Maybe you
can simplify code even more. During tests (dual PIII/450, 256MB RAM)
system is completely dead for up to 40 secs, so do not panic. It is not
needed to hit reset button, just wait ;-) Any idea welcomed.
Best regards,
Petr Vandrovec
[EMAIL PROTECTED]

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

unsigned char zero;

#define MSIZE 0x0E00

void x4768(void) {
int fd;
pid_t pid;
int x4778_1[2];
int x4778_2[2];
int x4779_1[2];
int x4779_2[2];
int from4778;
int to4778;
int from4779;
int to4779;
char* mma[65536];
unsigned int mml[65536];
unsigned int mmi = 0;
unsigned int ln = 0;
unsigned int x;

#define MAL2(a,l) ftruncate(fd, ln+l); mml[mmi] = l; mma[mmi++] = mmap(a, l, 
PROT_READ|PROT_WRITE, MAP_SHARED, fd, ln); ln += l; mma[mmi-1][0] = 99;
#define MAL(l) MAL2(NULL,l)

fd = open("ram0", O_RDWR | O_CREAT, 0600);
unlink("ram0");
MAL(MSIZE);
pipe(x4778_1);
pipe(x4778_2);
pid = fork();
if (!pid) {
int from4768 = x4778_1[0];
int to4768 = x4778_2[1];
close(x4778_1[1]);
close(x4778_2[0]);
close(fd);
read(from4768, &zero, 1);
for (x = 0; x < mml[mmi - 1]; x += 4096)
mma[mmi-1][x] = 98;
write(to4768, &zero, 1);
read(from4768, &zero, 1);
exit(0);

} else if (pid < 0) {
perror("fork failed");
exit(222);
}
to4778 = x4778_1[1];
from4778 = x4778_2[0];
close(x4778_1[0]);
close(x4778_2[1]);

/* so that read/write to this fails... */
from4779 = -1;
to4779 = -1;
{
write(to4778, &zero, 1);
write(to4779, &zero, 1);
read(from4778, &zero, 1);
read(from4779, &zero, 1);

write(to4778, &zero, 1);
write(to4779, &zero, 1);
sleep(5);
ftruncate(fd, 0);
}
close(fd);
while (mmi--) {
munmap(mma[mmi], mml[mmi]);
}
exit(0);
}

int main(int argc, char* argv[]) {
printf("Go\n");
mlockall(MCL_CURRENT);

x4768();
return 0;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_page

2000-10-19 Thread Russell King

Hi,

I'm seeing an oops caused by a NULL pointer dereference in mm/filemap.c,
filemap_write_page.  The NULL pointer in question is page->mapping.
The box on which this is happening is using a root NFS filesystem (in
fact all but one of its filesystems are NFS).

So the file that was mapped has been deleted...  Ok, 
With the following in filemap_sync_pte:

if (page->index != pgoff) {
printk("weirdness: pgoff=%lu index=%lu address=%lu vm_start=%lu 
vm_pgoff=%lu\n",
pgoff, page->index, address, vma->vm_start, vma->vm_pgoff);
}
+   if (!page->mapping) {
+   printk("--> page %p mapping NULL - address %lx\n", page, address);
+   printk("--> vma->vm_file = %p\n", vma->vm_file);
+   if (vma->vm_file) {
+   char *s = d_path(vma->vm_file->f_dentry, 
+vma->vm_file->f_vfsmnt, buf, PAGE_SIZ$
+   printk("--> %s\n", s);
+   }
+   page_cache_free(page);
+   return 0;
+   }
lock_page(page);

this reveals:

--> page c017e118 mapping NULL - address 40264000
--> vma->vm_file = c0c8b120
--> /var/tmp/.nfs16130002 (deleted)

And the stack trace indicates that the kernel execution path came from
sys_munmap.

(and the box doesn't hang, so I can then get the following strace and
send you this mail... should've kept the old sendmail binary lying
around...):

open("/var/tmp/000703", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
fcntl(5, F_SETFD, FD_CLOEXEC)   = 0
lseek(5, 0, SEEK_END)   = 0
lseek(5, 49152, SEEK_CUR)   = 49152
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
old_mmap(NULL, 53248, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0) = 0x40264000
fcntl(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = 0
fcntl(5, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=1}) = 0
close(5)= 0
unlink("/var/tmp/000703")   = 0
...
munmap(0x40264000, 53248)   = 0

So I'm guessing that the unlink is causing NFS to remove page->mapping
for a dirty mmap'd file, which then causes munmap to explode.

I'm not a VFS, MM nor NFS wizard, so it'll probably take me many hours
to figure out what's causing this.  Any chance someone else can take a
look?
   _
  |_| - ---+---+-
  |   | Russell King[EMAIL PROTECTED]  --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
  | +-+-+ --- -+-
  /   |   THE developer of ARM Linux  |+| /|\
 /  | | | ---  |
+-+-+ -  /\\\  |

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[PATCH] fixes to Pentium IV detection in test10-pre3

2000-10-18 Thread Mikael Pettersson

Here's a patch which should improve Pentium IV support in test10-pre.

The test10-pre3 patch detects Pentium IV (family code 15) but resets
boot_cpu_data.x86 to 6 in this case.

The advantage of doing this is that the places in the kernel which
construct cpu family names (i386 .. i686) still work: the Pentium IV
will be announced as an i686. (Inaccurate, but who cares? These
names are meaningless anyway. Why should a K7 be i686?)

The BIG disadvantage is that low-level cpu-specific drivers break if
boot_cpu_data.x86 doesn't equal the family code returned by CPUID.

The patch contains the following fixes/changes:
* arch/i386/kernel/microcode.c:
  - make check for Intel P6 family be strict
* arch/i386/kernel/setup.c:
  - (Pentium IV) remove broken "c->x86 = 6;" assignment, so c->x86
remains == 15
  - (Pentium IV) don't goto name_decoded, return instead; otherwise
the name grabbed from the extended cpuid levels will be clobbered
  - print cpu family as integer not char in /proc/cpuinfo
* include/asm-i386/bugs.h:
  - the test for K6 B-step chips was missing a test for family==5
  - init of system_utsname.machine maps family codes > 6 to 6;
if anyone can suggest a better naming scheme for newer cpus
like P4 and the Itanium's embedded PentiumIII let me know.
* include/asm-i386/elf.h:
  - same fix as in the system_utsname.machine init code

Alan wrote:
> What about the other proc stuff. This will report a 1586, type 15 cpu and

2.2? I looked carefully through 2.4 and fixed all places I found where
"1586" or "?86" ('?' == '0'+15) could have been constructed.
setup.c's print_cpu_info() is safe since x86_model_id was set from the
extended cpuid levels data.


/Mikael


--- linux-2.4.0-test10-pre3/arch/i386/kernel/microcode.c.~1~Sat Sep  9 12:49:37 
2000
+++ linux-2.4.0-test10-pre3/arch/i386/kernel/microcode.cTue Oct 17 20:18:07 
+2000
@@ -177,7 +177,7 @@
 
req->err = 1; /* assume the worst */
 
-   if (c->x86_vendor != X86_VENDOR_INTEL || c->x86 < 6){
+   if (c->x86_vendor != X86_VENDOR_INTEL || c->x86 != 6){
printk(KERN_ERR "microcode: CPU%d not an Intel P6\n", cpu_num);
return;
    }
--- linux-2.4.0-test10-pre3/arch/i386/kernel/setup.c.~1~Mon Oct 16 17:29:05 
2000
+++ linux-2.4.0-test10-pre3/arch/i386/kernel/setup.cWed Oct 18 22:36:26 2000
@@ -1548,9 +1548,8 @@
 
/* Pentium IV. */
if (c->x86 == 15) {
-   c->x86 = 6;
get_model_name(c);
-   goto name_decoded;
+   return;
}
 
/* Names for the Pentium II/Celeron processors 
@@ -1689,12 +1688,12 @@
 #endif
p += sprintf(p,"processor\t: %d\n"
"vendor_id\t: %s\n"
-   "cpu family\t: %c\n"
+   "cpu family\t: %d\n"
"model\t\t: %d\n"
"model name\t: %s\n",
n,
c->x86_vendor_id[0] ? c->x86_vendor_id : "unknown",
-       c->x86 + '0',
+   c->x86,
    c->x86_model,
c->x86_model_id[0] ? c->x86_model_id : "unknown");
 
--- linux-2.4.0-test10-pre3/include/asm-i386/bugs.h.~1~ Sat Sep  9 12:49:40 2000
+++ linux-2.4.0-test10-pre3/include/asm-i386/bugs.h Wed Oct 18 22:22:19 2000
@@ -166,6 +166,7 @@
 static void __init check_amd_k6(void)
 {
if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+   boot_cpu_data.x86 == 5 &&
boot_cpu_data.x86_model == 6 &&
boot_cpu_data.x86_mask == 1)
{
@@ -426,5 +427,5 @@
check_pentium_f00f();
 #endif
check_cyrix_coma();
-       system_utsname.machine[1] = '0' + boot_cpu_data.x86;
+   system_utsname.machine[1] = '0' + (boot_cpu_data.x86 > 6 ? 6 : 
+boot_cpu_data.x86);
 }
--- linux-2.4.0-test10-pre3/include/asm-i386/elf.h.~1~  Fri Jul 14 18:22:10 2000
+++ linux-2.4.0-test10-pre3/include/asm-i386/elf.h  Wed Oct 18 22:17:06 2000
@@ -93,7 +93,7 @@
For the moment, we have only optimizations for the Intel generations,
but that could change... */
 
-#define ELF_PLATFORM  ("i386\0i486\0i586\0i686"+((boot_cpu_data.x86-3)*5))
+#define ELF_PLATFORM  
+("i386\0i486\0i586\0i686"+(((boot_cpu_data.x86>6?6:boot_cpu_data.x86)-3)*5))
 
 #ifdef __KERNEL__
 #define SET_PERSONALITY(ex, ibcs2) set_personality((ibcs2)?PER_SVR4:PER_LINUX)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3 Ooops

2000-10-18 Thread Mike Galbraith

On Wed, 18 Oct 2000, Mike Galbraith wrote:

> On Wed, 18 Oct 2000, Gary E. Miller wrote:
> 
> > Yo Mike!
> > 
> > On Wed, 18 Oct 2000, Mike Galbraith wrote:
> > 
> > > > Help!  See below for my kernel oops.  I have not been able to use any 
> > > > kernel after 2.4.0-test5 due to this problem.  It happens shortly
> > > > after booting the kernel and is very repeatable.
> > 
> > > Are you sure that you used the right System.map?
> > Yes, I just rechecked the date and time of all the files.  I will
> > re-run the whole test in the morning just to be sure.
> > 
> > > > Trace; c01df362 
> > >  ^^^
> > > Here, scsi_dispatch_cmd() isn't that large.. it's only 0x168 in size.
> > Mine is 0x26c, maybe you have a different compiler or different config?
> 
> Ok, I don't _see_ what could make such a big difference, but something
> obviously does.

(Decides to take another peek.  Yup, scsi logging adds quite a bit)

Looks like deadlock at scsi.c:696 to me.  ide_end_request() has the
io_request_lock on cpu0 and we try to grab it again with the same cpu.

What is unclear to me (SMP.. technology, sufficiently advanced.. magic)
is how in the heck the scsi interrupt happened on cpu0.  It _looks_ to
me as though md_spin_unlock_irq() enabled interrupts at a very bad time.

>>EIP; c0272362<=
Trace; c01df362 scsi.c:696 deadlock cpu0
Trace; c010cab1 
Trace; c010cc98  Uhoh!
Trace; c010b370 
Trace; c02215d3   md_spin_unlock_irq()!!
Trace; c022167a 
Trace; c01927c9  ll_rw_blk.c:1000
Trace; c01c99a8 ide.c:516 (cpu0 has lock,
Trace; c01cdadb (interrupts are disabled)
Trace; c01cb27e 
Trace; c01cda30 
Trace; c010cab1 
Trace; c010cc98 
Trace; c010b370 
Code;  c0272362 

If this were happening on my system, I'd boldly change raid1_free_bh()
like such..

--- drivers/md/raid1.c.org  Wed Oct 18 15:30:07 2000
+++ drivers/md/raid1.c  Wed Oct 18 15:33:08 2000
@@ -91,7 +91,8 @@
 
 static inline void raid1_free_bh(raid1_conf_t *conf, struct buffer_head *bh)
 {
-   md_spin_lock_irq(&conf->device_lock);
+   unsigned long flags;
+   md_spin_lock_irqsave(&conf->device_lock, flags);
while (bh) {
struct buffer_head *t = bh;
bh=bh->b_next;
@@ -103,7 +104,7 @@
conf->freebh_cnt++;
}
}
-   md_spin_unlock_irq(&conf->device_lock);
+   md_spin_unlock_irqrestore(&conf->device_lock, flags);
wake_up(&conf->wait_buffer);
 }

..and see if the problem went away.
 
Now, if I'm off base, someone please clean my clock so I'll understand
better next time I foolishly attempt to figure out an SMP deadlock ;-)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test10-pre3

2000-10-18 Thread Thomas Molina

On Wed, 18 Oct 2000, Mike A. Harris wrote:

> It's RNCD:  Roman Numeral Coded Decimal.  The new standard for
> information interchange.  A new proprietary feature of Intel
> CPU's that will bring new high performance web sites to the reach
> of the masses!  Now you can access these special custom websites
> only if you have a real Intel CPU!  That is because other CPU's
> from AMD, etc.. do not yet have the high technological edge of
> "Roman Numeral Technology Extensions(TM)"  (RMTE).

And since IBM is getting into the open source movement they will issue
their own version called ERNCDIC (Extended Roman Numeral Coded Decimal
Interchange Code) to complement EBCDIC.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test10-pre3

2000-10-18 Thread Mark Salisbury

On Tue, 17 Oct 2000, David Weinehall wrote:
> On Tue, Oct 17, 2000 at 08:14:58AM -0400, Mark Salisbury wrote:
> > 
> > On Mon, 16 Oct 2000, [EMAIL PROTECTED] wrote:
> > > On Tue, 17 Oct 2000, Mikael Pettersson wrote:
> > 
> > > Why Intel chose family 15 is still beyond me though.
> > 
> > IV is 15 if you just translate the symbols, but ignore the meaning
> > either that or someone was smoking alot of crack.
> 
> Huh? IV == 4. XV == 15.
> 
> I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, M = 1000...
> 
> Unless you translate the signs one by one. But that's not the correct
> way to do it. 

see the qualifyer in my statement:
"...if you just translate the symbols, but ignore the meaning..."

of course that is not the right way to do it, when has that ever stopped intel?



-- 
/***
**   Mark Salisbury | Mercury Computer Systems**
**   [EMAIL PROTECTED] | System OS - Kernel Team **
****
**  "WYGIWYD - What You Get Is What You Deserve"  **
***/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test10-pre3

2000-10-18 Thread Markus Pfeiffer

"Mike A. Harris" wrote:
> 
> On Tue, 17 Oct 2000, Michael H. Warfield wrote:
> 
> >> > On Mon, 16 Oct 2000, [EMAIL PROTECTED] wrote:
> >> > > On Tue, 17 Oct 2000, Mikael Pettersson wrote:
> >
> >> > > Why Intel chose family 15 is still beyond me though.
> >
> >> > IV is 15 if you just translate the symbols, but ignore the meaning
> >> > either that or someone was smoking alot of crack.
> >
> >> Huh? IV == 4. XV == 15.
> >
> >> I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, M = 1000...
> 
> It's RNCD:  Roman Numeral Coded Decimal.  The new standard for
> information interchange.  A new proprietary feature of Intel
> CPU's that will bring new high performance web sites to the reach
> of the masses!  Now you can access these special custom websites
> only if you have a real Intel CPU!  That is because other CPU's
> >from AMD, etc.. do not yet have the high technological edge of
> "Roman Numeral Technology Extensions(TM)"  (RMTE).
> 
> Rumor mill has it though that AMD refuses to pay Intel royalties
> for this new technology and is coming out with their own version
> called RM-NOW! (TM) which does everything Intel's does, but
> faster, and with less silicon and heat dissipation.
> 
> Just beware of the CPU Mhz ratings on these new processors.  They
> are given in roman numerals.
>

Intel marketing (c)(r)tm has said that its not Pentium IV but Pentium 4
(officially) ...
I bet one can find numeral puns everywhere ... (look at kernel ...)
My suspicion is that the Pentium 4 will be Intels latest IA32 Processor
familiy (or at least should be if there wasnt this AMD thing (<- note
humor, please)
And I think its not too bad. perhaps they put all their clever heads
into IA64 Processor department and IA32 only got beginners ... :-)

cheers Markus






 
> --
>   Mike A. Harris  -  Linux advocate  -  Open source advocate
>   Computer Consultant - Capslock Consulting
>  Copyright 2000 all rights reserved
> --
> 
> "Facts do not cease to exist because they are ignored."
>- Aldous Huxley
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test10-pre3

2000-10-18 Thread Mike A. Harris

On Tue, 17 Oct 2000, Michael H. Warfield wrote:

>> > On Mon, 16 Oct 2000, [EMAIL PROTECTED] wrote:
>> > > On Tue, 17 Oct 2000, Mikael Pettersson wrote:
>
>> > > Why Intel chose family 15 is still beyond me though.
>
>> > IV is 15 if you just translate the symbols, but ignore the meaning
>> > either that or someone was smoking alot of crack.
>
>> Huh? IV == 4. XV == 15.
>
>> I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, M = 1000...

It's RNCD:  Roman Numeral Coded Decimal.  The new standard for
information interchange.  A new proprietary feature of Intel
CPU's that will bring new high performance web sites to the reach
of the masses!  Now you can access these special custom websites
only if you have a real Intel CPU!  That is because other CPU's
from AMD, etc.. do not yet have the high technological edge of
"Roman Numeral Technology Extensions(TM)"  (RMTE).

Rumor mill has it though that AMD refuses to pay Intel royalties
for this new technology and is coming out with their own version
called RM-NOW! (TM) which does everything Intel's does, but
faster, and with less silicon and heat dissipation.

Just beware of the CPU Mhz ratings on these new processors.  They
are given in roman numerals.



--
  Mike A. Harris  -  Linux advocate  -  Open source advocate
  Computer Consultant - Capslock Consulting
 Copyright 2000 all rights reserved
--


"Facts do not cease to exist because they are ignored."
   - Aldous Huxley

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-test10-pre3 isapnp-patch breaks sb detection

2000-10-18 Thread Richard Guenther

Hi!

The isapnp change in 2.4.0-test10-pre beaks the detection of
my non-pnp soundblaster AWE32 card (compiled into the kernel,
as modular operation oopses - see mail with Subject:sound related OOPS
(2.4.0-test9-pre7) from Sun, 1 Oct 2000 15:49:46). The kernel
hangs after printing the message
sb: No ISAPnP cards found...
in working kernels after that the sb card with io/irq/etc. is printed.

There are no sound changes in the 2.4.0-test10-pre patchset, so
I highly suspect the isapnp-change - configuring isapnp out
gives a working system again. (note that isapnp is modular in my
system, used for an isapnp ne2000 clone).

Richard.

--
Richard Guenther <[EMAIL PROTECTED]>
WWW: http://www.anatom.uni-tuebingen.de/~richi/
The GLAME Project: http://www.glame.de/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3 Ooops

2000-10-18 Thread Mike Galbraith

On Wed, 18 Oct 2000, Gary E. Miller wrote:

> Yo Mike!
> 
> On Wed, 18 Oct 2000, Mike Galbraith wrote:
> 
> > > Help!  See below for my kernel oops.  I have not been able to use any 
> > > kernel after 2.4.0-test5 due to this problem.  It happens shortly
> > > after booting the kernel and is very repeatable.
> 
> > Are you sure that you used the right System.map?
> Yes, I just rechecked the date and time of all the files.  I will
> re-run the whole test in the morning just to be sure.
> 
> > > Trace; c01df362 
> >  ^^^
> > Here, scsi_dispatch_cmd() isn't that large.. it's only 0x168 in size.
> Mine is 0x26c, maybe you have a different compiler or different config?

Ok, I don't _see_ what could make such a big difference, but something
obviously does.

> > If you want to try some light troubleshooting, grab kdb..
> > ftp://oss.sgi.com/www/projects/kdb/download/ix86/kdb-v1.5-2.4.0-test9-pre9.gz
> > .. and check what both cpus were up to at lock time.  A stack trace of
> > both cpus might help developers locate the trouble.
> 
> I thought Linus was not a big fan of kernel debuggers?  I will look into
> it if I get no better suggestions.

Linus doesn't need a supporter (nads of steel;), but the vast majority
of folks out there aren't in the same league as Linus.  Heck, some of
us aren't even sure we're playing the same sport.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3 Ooops

2000-10-18 Thread Gary E. Miller

Yo Mike!

On Wed, 18 Oct 2000, Mike Galbraith wrote:

> > Help!  See below for my kernel oops.  I have not been able to use any 
> > kernel after 2.4.0-test5 due to this problem.  It happens shortly
> > after booting the kernel and is very repeatable.

> Are you sure that you used the right System.map?
Yes, I just rechecked the date and time of all the files.  I will
re-run the whole test in the morning just to be sure.

> > Trace; c01df362 
>  ^^^
> Here, scsi_dispatch_cmd() isn't that large.. it's only 0x168 in size.
Mine is 0x26c, maybe you have a different compiler or different config?

> If you want to try some light troubleshooting, grab kdb..
> ftp://oss.sgi.com/www/projects/kdb/download/ix86/kdb-v1.5-2.4.0-test9-pre9.gz
> .. and check what both cpus were up to at lock time.  A stack trace of
> both cpus might help developers locate the trouble.

I thought Linus was not a big fan of kernel debuggers?  I will look into
it if I get no better suggestions.

RGDS
GARY
---
Gary E. Miller Rellim 20340 Empire Ave, Suite E-3, Bend, OR 97701
[EMAIL PROTECTED]  Tel:+1(541)382-8588 Fax: +1(541)382-8676


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.0-test10-pre3 Ooops

2000-10-17 Thread Mike Galbraith



On Tue, 17 Oct 2000, Gary E. Miller wrote:

> Yo All!
> 
> Help!  See below for my kernel oops.  I have not been able to use any 
> kernel after 2.4.0-test5 due to this problem.  It happens shortly
> after booting the kernel and is very repeatable.
> 
> This is a dual PII system with PIIX4 ide, 53c875 scsi and Raid 1.
> It is not a production system so I am open to any patches or
> tests.
> 
> The system would not even stay up long enough to run ksymoops so
> I had to copy the data and run it under 2.2.17.
> 
> Any ideas out there?

Hi,

Are you sure that you used the right System.map?

> Trace; c01df362 
 ^^^
Here, scsi_dispatch_cmd() isn't that large.. it's only 0x168 in size.

(scsi_dispatch_cmd+0x1c6 is very close to scsi_wait_req():down (&sem))

If you want to try some light troubleshooting, grab kdb..
ftp://oss.sgi.com/www/projects/kdb/download/ix86/kdb-v1.5-2.4.0-test9-pre9.gz
.. and check what both cpus were up to at lock time.  A stack trace of
both cpus might help developers locate the trouble.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-test10-pre3 Ooops

2000-10-17 Thread Gary E. Miller

Yo All!

Help!  See below for my kernel oops.  I have not been able to use any 
kernel after 2.4.0-test5 due to this problem.  It happens shortly
after booting the kernel and is very repeatable.

This is a dual PII system with PIIX4 ide, 53c875 scsi and Raid 1.
It is not a production system so I am open to any patches or
tests.

The system would not even stay up long enough to run ksymoops so
I had to copy the data and run it under 2.2.17.

Any ideas out there?

RGDS
GARY
---
Gary E. Miller Rellim 20340 Empire Ave, Suite E-3, Bend, OR 97701
[EMAIL PROTECTED]  Tel:+1(541)382-8588 Fax: +1(541)382-8676

ksymoops 2.3.4 on i686 2.2.17pre10-RAID.  Options used
 -V (default)
 -k ./ksyms (specified)
 -l ./modules (specified)
 -o /lib/modules/2.4.0-test10 (specified)
 -m /usr/src/linux/System.map (default)

No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is ./modules a valid lsmod file?
NMI Watchdog detected LOCKUP on CPU0, registers:
CPU:0
EIP:0010:[]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 0082
eax: c01cb154   ebx: c122bdc0   ecx:    edx: 
esi: c12b9d60   edi: 0282   ebp: 000e   esp: c37fbe44
ds: 0018   es: 0018   ss: 0018
Process cc1 (pid: 1484, stackpage=c37fb000)
Stack: c122bdc0 0401  000e c01df362 c010cab1 000e c12b9d60 
   c37fbea8 c036e6c0 c034b9c0 000e c37fbea0 c010cc98 000e c37fbea8 
   c122bdc0 c1384800  c1384c78   c122bdc0 c2722500 
Call Trace: [] [] [] [] [] 
[] [] 
   [] [] [] [] [] [] 
[] 
Code: f3 90 7e f5 e9 f7 8d f5 ff 80 3d 64 39 2f c0 00 f3 90 7e f5 

>>EIP; c0272362<=
Trace; c01df362 
Trace; c010cab1 
Trace; c010cc98 
Trace; c010b370 
Trace; c02215d3 
Trace; c022167a 
Trace; c01927c9 
Trace; c01c99a8 
Trace; c01cdadb 
Trace; c01cb27e 
Trace; c01cda30 
Trace; c010cab1 
Trace; c010cc98 
Trace; c010b370 
Code;  c0272362 
 <_EIP>:
Code;  c0272362<=
   0:   f3 90 repz nop<=
Code;  c0272364 
   2:   7e f5 jlefff9 <_EIP+0xfff9> c027235b 

Code;  c0272366 
   4:   e9 f7 8d f5 ffjmpfff58e00 <_EIP+0xfff58e00> c01cb162 

Code;  c027236b 
   9:   80 3d 64 39 2f c0 00  cmpb   $0x0,0xc02f3964
Code;  c0272372 
  10:   f3 90 repz nop 
Code;  c0272374 
  12:   7e f5 jle9 <_EIP+0x9> c027236b 


1 warning issued.  Results may not be reliable.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test10-pre3

2000-10-17 Thread Michael H. Warfield

On Wed, Oct 18, 2000 at 12:12:39AM +0200, David Weinehall wrote:
> On Tue, Oct 17, 2000 at 08:14:58AM -0400, Mark Salisbury wrote:

> > On Mon, 16 Oct 2000, [EMAIL PROTECTED] wrote:
> > > On Tue, 17 Oct 2000, Mikael Pettersson wrote:

> > > Why Intel chose family 15 is still beyond me though.

> > IV is 15 if you just translate the symbols, but ignore the meaning
> > either that or someone was smoking alot of crack.

> Huh? IV == 4. XV == 15.

> I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, M = 1000...

He said if you just translate the symbols...  He really sort of
meant translate the digits (Roman Numeral Digits).

i.e...

IV => I   V
I   V  => 1   5
1   5  => 15.

When I was at Michigan State University, we had something similar.
The mainframe there was a CDC 6500, a 6000 series system, and the programing
group called themselves the VIM group for "6000 series".  That was
VIM => VI M => 6 1000 => 6000.

> Unless you translate the signs one by one. But that's not the correct
> way to do it. In that case, Pentium III would have the modelcode
> 0x0111...

No it's not the correct way, but that's not the point, is it.

> Then again, we all know that Intel has made processors in the past that
> couldn't count, so I guess it's possible that they screwed up this time
> too :^)

No.  They didn't screw up.  They just tried a Roman Numeral pun.

> /David
>   _ _
>  // David Weinehall <[EMAIL PROTECTED]> /> Northern lights wander  \\
> //  Project MCA Linux hacker//  Dance across the winter sky //
> \>  http://www.acc.umu.se/~tao/http://www.wittsend.com/mhw/
  NIC whois:  MHW9  |  An optimist believes we live in the best of all
 PGP Key: 0xDF1DD471|  possible worlds.  A pessimist is sure of it!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test10-pre3

2000-10-17 Thread David Weinehall

On Tue, Oct 17, 2000 at 08:14:58AM -0400, Mark Salisbury wrote:
> 
> On Mon, 16 Oct 2000, [EMAIL PROTECTED] wrote:
> > On Tue, 17 Oct 2000, Mikael Pettersson wrote:
> 
> > Why Intel chose family 15 is still beyond me though.
> 
> IV is 15 if you just translate the symbols, but ignore the meaning
> either that or someone was smoking alot of crack.

Huh? IV == 4. XV == 15.

I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, M = 1000...

Unless you translate the signs one by one. But that's not the correct
way to do it. In that case, Pentium III would have the modelcode
0x0111...

Then again, we all know that Intel has made processors in the past that
couldn't count, so I guess it's possible that they screwed up this time
too :^)


/David
  _ _
 // David Weinehall <[EMAIL PROTECTED]> /> Northern lights wander  \\
//  Project MCA Linux hacker//  Dance across the winter sky //
\>  http://www.acc.umu.se/~tao/http://www.tux.org/lkml/



Re: PCMCIA slot 1 invisible under 2.4.0-test10-pre3

2000-10-17 Thread Erik Mouw

On Tue, Oct 17, 2000 at 05:17:38PM -0400, [EMAIL PROTECTED] wrote:
> When I tried 2.4.0-test10-pre3 on my IBM Thinkpad 380XD (the 233 MHz
> flavor), PCMCIA slot #1 (the lower one) seemed to be dead: inserting and
> removing cards gave me no beeps and no log messages.  Slot #0 (the upper
> one) behaved fine.  It didn't matter whether I had one or two cards in.
> 
> test10-pre2 seemed to have the same problem.  test10-pre1 (and its
> predecessors) work fine.
> 
> I have CONFIG_PCMCIA == CONFIG_CARDBUS == 'y' and (on the userspace side)
> pcmcia-cs 3.1.20.

I got the same with test10-pre3: slot #0 worked, slot #1 was dead.
I actualy forgot about it because rebooting solved the problem (the
machine got some extra RAM). My laptop is an Asus P6300 (PII/266, Intel
440 BX/ZX, TI PCI1131 cardbus bridge).


Erik

-- 
J.A.K. (Erik) Mouw, Information and Communication Theory Group, Department
of Electrical Engineering, Faculty of Information Technology and Systems,
Delft University of Technology, PO BOX 5031,  2600 GA Delft, The Netherlands
Phone: +31-15-2783635  Fax: +31-15-2781843  Email: [EMAIL PROTECTED]
WWW: http://www-ict.its.tudelft.nl/~erik/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



PCMCIA slot 1 invisible under 2.4.0-test10-pre3

2000-10-17 Thread kimoto

When I tried 2.4.0-test10-pre3 on my IBM Thinkpad 380XD (the 233 MHz
flavor), PCMCIA slot #1 (the lower one) seemed to be dead: inserting and
removing cards gave me no beeps and no log messages.  Slot #0 (the upper
one) behaved fine.  It didn't matter whether I had one or two cards in.

test10-pre2 seemed to have the same problem.  test10-pre1 (and its
predecessors) work fine.

I have CONFIG_PCMCIA == CONFIG_CARDBUS == 'y' and (on the userspace side)
pcmcia-cs 3.1.20.

-- 
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test10-pre3

2000-10-17 Thread davej

On Tue, 17 Oct 2000, Mike A. Harris wrote:

> >Well as Intel isn't even shipping P4 samples yet, most of this is just
> >guesswork based upon preliminary datasheets. I wouldn't be surprised
> >if we find other fun things to work around when we start seeing
> >silicone in use.
> 
> Heck, you don't need to wait at all for Intel.  You can get
> "silicone" at the hardware store.  When put on "Windows" it seals
> the heat in.  The heat of course is generated by Intel CPU's.
> ;o)  Sorry I couldn't resist the bad pun on your typo.  ;o)

Oh dear.  Too much wine and not enough sleep.
Be thankful I didn't write code too in that mail :)

regards,

davej.

-- 
| Dave Jones <[EMAIL PROTECTED]>  http://www.suse.de/~davej
| SuSE Labs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test10-pre3

2000-10-17 Thread Mark Salisbury


On Mon, 16 Oct 2000, [EMAIL PROTECTED] wrote:
> On Tue, 17 Oct 2000, Mikael Pettersson wrote:

> Why Intel chose family 15 is still beyond me though.

IV is 15 if you just translate the symbols, but ignore the meaning

either that or someone was smoking alot of crack.

-- 
/***
**   Mark Salisbury | Mercury Computer Systems**
**   [EMAIL PROTECTED] | System OS - Kernel Team **
****
**  "WYGIWYD - What You Get Is What You Deserve"  **
***/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PC speaker driver patch for 2.4.0-test10-pre3

2000-10-17 Thread Alan Cox

> >Thanks to Erik Inge Bols=F8 for porting it to 2.3.45, this saving me m=
> ost of=20
> >the work.
> 
> Is there a major compelling reason that this patch isn't included
> in the standard kernel tree?

It goes hacking around with the clock

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: compile error in 2.4.0-test10-pre3

2000-10-17 Thread Igmar Palsenberg


> [root@wr5z linux]# make bzImage
> gcc -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -o
> scripts/split-include scripts/split-include.c
> In file included from /usr/include/errno.h:36,
>  from scripts/split-include.c:26:
> /usr/include/bits/errno.h:25: linux/errno.h: No such file or directory
  
  
  You're missing the link to /usr/src/linux from /usr/include

  In my case, /usr/include/linux is linked to /usr/src/linux/include/linux
  Same goes for /usr/include/asm, which points to 
  /usr/src/linux/include/asm

> make: *** [scripts/split-include] Error 1


  Igmar 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test10-pre3

2000-10-17 Thread Mike A. Harris

On Tue, 17 Oct 2000 [EMAIL PROTECTED] wrote:

>*grin*
>
>Well as Intel isn't even shipping P4 samples yet, most of this is just
>guesswork based upon preliminary datasheets. I wouldn't be surprised
>if we find other fun things to work around when we start seeing
>silicone in use.

Heck, you don't need to wait at all for Intel.  You can get
"silicone" at the hardware store.  When put on "Windows" it seals
the heat in.  The heat of course is generated by Intel CPU's.

;o)  Sorry I couldn't resist the bad pun on your typo.  ;o)


--
  Mike A. Harris  -  Linux advocate  -  Open source advocate
  Computer Consultant - Capslock Consulting
 Copyright 2000 all rights reserved
--

Red Hat Linux:  http://www.redhat.com
Download for free:  ftp://ftp.redhat.com/pub/redhat/redhat-6.2/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: two patches for 2.4.0-test10-pre3

2000-10-16 Thread Rusty Russell

In message <[EMAIL PROTECTED]> you write:
> --- linux/net/ipv6/netfilter/Config.in.oldMon Oct 16 17:25:17 2000
> +++ linux/net/ipv6/netfilter/Config.inMon Oct 16 17:46:07 2000
> +if [ "$CONFIG_IP6_NF_TARGET_MARK" = "y" -a "$CONFIG_IP_NF_IPTABLES" != "
y" ]; then
> +  define_bool CONFIG_IP_NF_IPTABLES y

Hi Daiki!

The code in ip6t_MARK.c shouldn't depend on IPv4: the bug is there
(probably should be ip6t_register_target, etc.).

Hope that helps,
Rusty.
--
Hacking time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PC speaker driver patch for 2.4.0-test10-pre3

2000-10-16 Thread David Woodhouse

On Tue, 17 Oct 2000, Mike A. Harris wrote:

> Is there a major compelling reason that this patch isn't included
> in the standard kernel tree?

It does _evil_ things with the timers. If we shift the system timer tick
onto the RTC, it won't be so evil, and I'd consider trying to submit it
for 2.5.

Patches accepted. 

-- 
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: compile error in 2.4.0-test10-pre3

2000-10-16 Thread Keith Owens

On Tue, 17 Oct 2000 00:45:13 -0500 (CDT), 
Thomas Molina <[EMAIL PROTECTED]> wrote:
>I patched from 2.4.0-test9 to 2.4.0-test10-pre3 successfully.  I then
>did make mrproper, make oldconfig, make dep successfully.  make bzImage
>resulted in the following error:
>
>[root@wr5z linux]# make bzImage
>gcc -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -o
>scripts/split-include scripts/split-include.c
>In file included from /usr/include/errno.h:36,
> from scripts/split-include.c:26:
>/usr/include/bits/errno.h:25: linux/errno.h: No such file or directory
>make: *** [scripts/split-include] Error 1

/usr/include/bits/errno.h assumes that /usr/include/linux/errno.h
always exists.  Linus says that /usr/include/linux should contain
constant versions of the kernel headers that reflect the kernel used to
build glibc.  However on many distributions, /usr/include/linux is a
symlink to some directory under /usr/src, which is fine until that
directory changes or goes away.  The same problem applies to
/usr/include/asm.

You probably do not have a /usr/src/linux directory any more.
Reinstall the kernel headers that came with your distribution then ask
the distributor why they ship in that format.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: compile error in 2.4.0-test10-pre3

2000-10-16 Thread Thomas Molina

On Tue, 17 Oct 2000, Thomas Molina wrote:

> I patched from 2.4.0-test9 to 2.4.0-test10-pre3 successfully.  I then
> did make mrproper, make oldconfig, make dep successfully.  make bzImage
> resulted in the following error:

Would someone please hand me one of those brown paper bags.

Note to self:  Next time get some sleep before doing something stupid
late at night and announcing same before everyone.  I was confirming my
problem about two minutes after sending that message, just before David
Miller was kind enough to let me know what I was doing wrong.

This was my first time doing a compile in a directory other than
/usr/src/linux per Linus' suggestion.  Specifically, /usr/include/asm
and /usr/include/linux were stale symlinks.  Pointing them to the new
kernel source tree allowed the compile to proceed without incident.

My hazy recollection told me Linus had said those symlinks shouldn't
even be needed or should point to the "default" source tree against
which glibc was compiled.  I guess I was wrong.

Hopefully this gets out before I get too many repeats of David's
message.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



compile error in 2.4.0-test10-pre3

2000-10-16 Thread Thomas Molina

I patched from 2.4.0-test9 to 2.4.0-test10-pre3 successfully.  I then
did make mrproper, make oldconfig, make dep successfully.  make bzImage
resulted in the following error:

[root@wr5z linux]# make bzImage
gcc -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -o
scripts/split-include scripts/split-include.c
In file included from /usr/include/errno.h:36,
 from scripts/split-include.c:26:
/usr/include/bits/errno.h:25: linux/errno.h: No such file or directory
make: *** [scripts/split-include] Error 1

Additional data:

[root@wr5z linux]# sh scripts/ver_linux
-- Versions installed: (if some fields are empty or look
-- unusual then possibly you have very old versions)
Linux wr5z 2.4.0-test9-pre2 #1 Sun Sep 17 21:35:55 CDT 2000 i586 unknown
Kernel modules 2.3.16
Gnu C  egcs-2.91.66
Gnu Make   3.78.1
Binutils   2.9.5.0.22
Linux C Library2.1.3
Dynamic linker ldd (GNU libc) 2.1.3
Procps 2.0.6
Mount  2.10f
Net-tools  1.54
Console-tools  0.3.3
Sh-utils   2.0
Modules Loaded opl3

#
# Automatically generated make config: don't edit
#
CONFIG_X86=y
CONFIG_ISA=y
# CONFIG_SBUS is not set
CONFIG_UID16=y

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_KMOD=y

#
# Processor type and features
#
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_M686FXSR is not set
CONFIG_MK6=y
# CONFIG_MK7 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_L1_CACHE_BYTES=32
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_TSC=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
# CONFIG_TOSHIBA is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
# CONFIG_SMP is not set
# CONFIG_X86_UP_IOAPIC is not set

#
# General setup
#
CONFIG_NET=y
# CONFIG_VISWS is not set
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
# CONFIG_MCA is not set
# CONFIG_HOTPLUG is not set
# CONFIG_PCMCIA is not set
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
# CONFIG_KCORE_AOUT is not set
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=m
CONFIG_PM=y
# CONFIG_ACPI is not set
CONFIG_APM=y
# CONFIG_APM_IGNORE_USER_SUSPEND is not set
# CONFIG_APM_DO_ENABLE is not set
# CONFIG_APM_CPU_IDLE is not set
# CONFIG_APM_DISPLAY_BLANK is not set
# CONFIG_APM_RTC_IS_GMT is not set
# CONFIG_APM_ALLOW_INTS is not set
CONFIG_APM_REAL_MODE_POWER_OFF=y

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
CONFIG_PARPORT=y
CONFIG_PARPORT_PC=y
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
# CONFIG_PARPORT_AMIGA is not set
# CONFIG_PARPORT_MFC3 is not set
# CONFIG_PARPORT_ATARI is not set
# CONFIG_PARPORT_SUNBPP is not set
# CONFIG_PARPORT_OTHER is not set
CONFIG_PARPORT_1284=y

#
# Plug and Play configuration
#
# CONFIG_PNP is not set
# CONFIG_ISAPNP is not set

#
# Block devices
#
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_DEV_XD is not set
# CONFIG_PARIDE is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
CONFIG_BLK_DEV_LOOP=y
# CONFIG_BLK_DEV_NBD is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=4096
# CONFIG_BLK_DEV_INITRD is not set

#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set
# CONFIG_BLK_DEV_MD is not set
# CONFIG_MD_LINEAR is not set
# CONFIG_MD_RAID0 is not set
# CONFIG_MD_RAID1 is not set
# CONFIG_MD_RAID5 is not set
# CONFIG_BLK_DEV_LVM is not set
# CONFIG_LVM_PROC_FS is not set

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETLINK=y
CONFIG_RTNETLINK=y
CONFIG_NETLINK_DEV=y
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
# CONFIG_FILTER is not set
CONFIG_UNIX=y
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_ARPD is not set
# CONFIG_INET_ECN is not set
CONFIG_SYN_COOKIES=y

#
#   IP: Netfilter Configuration
#
CONFIG_IP_NF_CONNTRACK=y
CONFIG_IP_NF_FTP=y
# CONFIG_IP_NF_QUEUE is not set
CONFIG_IP_NF_IPTABLES=y
# CONFIG_IP_NF_MATCH_LIMIT is not set
# CONFIG_IP_NF_MATCH_MAC is not set
# CONFIG_IP_NF_MATCH_MARK is not set
# CONFIG_IP_NF_MATCH_MULTIPORT is not set
# CONFIG_IP_NF_MATCH_TOS is not set
CONFIG_IP_NF_MATCH_STATE=y
# CONFIG_IP_NF_MATCH_UN

Re: test10-pre3

2000-10-16 Thread H. Peter Anvin

Followup to:  <[EMAIL PROTECTED]>
By author:[EMAIL PROTECTED]
In newsgroup: linux.dev.kernel
> 
> > but intel refuse to guarantee they wont produce an actual '786' class
> > CPU.
> 
> Worry about that if/when it happens ?
> 

Dare one guess the 786 is actually the Itanic in x86 mode?

-hpa
-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



  1   2   >