subject:"ps hang in 241\-pre10"

Re: ps hang in 241-pre10

2001-01-29 Thread Zdenek Kabelac

Linus Torvalds wrote:
> 
> In article <[EMAIL PROTECTED]>, David Ford  <[EMAIL PROTECTED]>
> wrote:
> >
> >We've narrowed it down to "we're all running xmms" when it happend.
> 
> Does anybody have a clue about what is different with xmms?
> 
> Does it use KNI if it can, for example? We used to have a problem with

Seeing this - I'll add my post here too - I've been burning one audio CD
last week and while I've been moving slider the system has locked  - I
think
the kernel version has been -ac7 - then I've used pre8 and I've been
playing divx file while burning four other CD with no problem.

My system is SMP Bp6 with SBLive kernel's emu driver.

-- 
 There are three types of people in the world:
   those who can count, and those who can't.
  Zdenek Kabelac  http://i.am/kabi/ [EMAIL PROTECTED] {debian.org; fi.muni.cz}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-29 Thread Zdenek Kabelac


Linus Torvalds wrote:
 
 In article [EMAIL PROTECTED], David Ford  [EMAIL PROTECTED]
 wrote:
 
 We've narrowed it down to "we're all running xmms" when it happend.
 
 Does anybody have a clue about what is different with xmms?
 
 Does it use KNI if it can, for example? We used to have a problem with

Seeing this - I'll add my post here too - I've been burning one audio CD
last week and while I've been moving slider the system has locked  - I
think
the kernel version has been -ac7 - then I've used pre8 and I've been
playing divx file while burning four other CD with no problem.

My system is SMP Bp6 with SBLive kernel's emu driver.

-- 
 There are three types of people in the world:
   those who can count, and those who can't.
  Zdenek Kabelac  http://i.am/kabi/ [EMAIL PROTECTED] {debian.org; fi.muni.cz}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-28 Thread Jens Axboe

On Sun, Jan 28 2001, Linus Torvalds wrote:
> On Sun, 28 Jan 2001, Jens Axboe wrote:
> > 
> > How about this instead?
> 
> I really don't like this one. It will basically re-introduce the old
> behaviour of waking people up in a trickle, as far as I can tell. The
> reason we want the batching is to make people have more requests to sort
> in the elevator, and as far as I can tell this will just hurt that.
> 
> Are there any downsides to just _always_ batching, regardless of whether
> the request freelist is empty or not? Sure, it will make the "effective"
> size of the freelist a bit smaller, but that's probably not actually
> noticeable under any load except for the one that empties the freelist (in
> which case the old code would have triggered the batching anyway).

The problem with removing the !list_empty test like you suggested
is that batching is no longer controlled anymore. If we start
batching once the lists are empty and start wakeups once batch_requests
has been reached, we know we'll give the elevator enough to work
with to be effective. With !list_empty removed, batch_requests is no
longer a measure of how many requests we want to batch. Always
batching is not a in problem in itself, the effective smaller freelist
effect should be neglible.

The sent patch will only trickle wakeups in case of batching already
in effect, but batch_request wakeups were not enough to deplete
the freelist again. At least that was the intended effect :-)

> Performance numbers?

Don't have any right now, will test a bit later.

-- 
* Jens Axboe <[EMAIL PROTECTED]>
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-28 Thread Linus Torvalds

On Sun, 28 Jan 2001, Jens Axboe wrote:
> 
> How about this instead?

I really don't like this one. It will basically re-introduce the old
behaviour of waking people up in a trickle, as far as I can tell. The
reason we want the batching is to make people have more requests to sort
in the elevator, and as far as I can tell this will just hurt that.

Are there any downsides to just _always_ batching, regardless of whether
the request freelist is empty or not? Sure, it will make the "effective"
size of the freelist a bit smaller, but that's probably not actually
noticeable under any load except for the one that empties the freelist (in
which case the old code would have triggered the batching anyway).

Performance numbers?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-28 Thread Jens Axboe


On Sun, Jan 28 2001, Lorenzo Allegrucci wrote:
> >Ho humm. Jens: imagine that you have more people waiting for requests than
> >"batchcount". Further, imagine that you have multiple requests finishing
> >at the same time. Not unlikely. Now, imagine that one request finishes,
> >and causes "batchcount" users to wake up, and immediately another request
> >finishes but THAT one doesn't wake anybody up because it notices that the
> >freelist isn't empty - so it thinks that it doesn't need to wake anybody.
> >
> >Lorenzo, does the problem go away for you if you remove the
> >
> > if (!list_empty(>request_freelist[rw])) {
> > ...
> > }
> >
> >code from blkdev_release_request() in drivers/block/ll_rw_block.c?
> 
> Yes, it does.

How about this instead?

--- /opt/kernel/linux-2.4.1-pre10/drivers/block/ll_rw_blk.c Thu Jan 25 19:15:12 
2001
+++ drivers/block/ll_rw_blk.c   Sun Jan 28 19:22:20 2001
@@ -633,6 +634,8 @@
if (!list_empty(>request_freelist[rw])) {
blk_refill_freelist(q, rw);
list_add(>table, >request_freelist[rw]);
+   if (waitqueue_active(>wait_for_request))
+   wake_up_nr(>wait_for_request, 2);
return;
}
 

-- 
* Jens Axboe <[EMAIL PROTECTED]>
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-28 Thread Linus Torvalds

On Sun, 28 Jan 2001, Marcelo Tosatti wrote:
> 
> Why dont you just put set_page_dirty() back in page_launder() in case
> writepage() fails?

Because a EIO or similar should _not_ be re-tried or kept dirty.

Imagine a bad user that goes over his quota on purpose, and then every
single write will always return an error. What should we do? Let him eat
all physical memory? I don't think so. 

write-out errors will be ignored. We _might_ send a signal or something,
but considering the fact that we don't even know who caused the dirty page
in the first place, even that is kind of hard.

Shared memory and out-of-swap is special - the shared memory code is
supposed to check that we have enough memory before it even allocates
anything.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-28 Thread Lorenzo Allegrucci


At 15.40 27/01/01 -0800, you wrote:
>
>
>On Sat, 27 Jan 2001, Lorenzo Allegrucci wrote:
>> 
>> A trivial "while(1) fork()" is enough to trigger it.
>> "mem=32M" by lilo, ulimit -u is 1024.
>
>Hmm.. This does not look like a VM deadlock - it looks like some IO
>request is waiting forever on "__get_request_wait()". In fact, it looks
>like a _lot_ of people are waiting for requests.
>
>So what happens is that somebody takes a page fault (and gets the mm
>lock), tries to read something in, and never gets anything back, thus
>leaving the MM locked.
>
>Jens: this looks suspiciously like somebody isn't waking things up when
>they add requests back to the request lists. Alternatively, maybe the
>unplugging isn't properly done, so that we have a lot of pending IO that
>doesn't get started..
>
>Ho humm. Jens: imagine that you have more people waiting for requests than
>"batchcount". Further, imagine that you have multiple requests finishing
>at the same time. Not unlikely. Now, imagine that one request finishes,
>and causes "batchcount" users to wake up, and immediately another request
>finishes but THAT one doesn't wake anybody up because it notices that the
>freelist isn't empty - so it thinks that it doesn't need to wake anybody.
>
>Lorenzo, does the problem go away for you if you remove the
>
>   if (!list_empty(>request_freelist[rw])) {
>   ...
>   }
>
>code from blkdev_release_request() in drivers/block/ll_rw_block.c?

Yes, it does.

--
Lorenzo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-28 Thread James Sutherland

On 27 Jan 2001, Linus Torvalds wrote:

> In article <[EMAIL PROTECTED]>, David Ford  <[EMAIL PROTECTED]>
> wrote:
> >
> >We've narrowed it down to "we're all running xmms" when it happend.
> 
> Does anybody have a clue about what is different with xmms?
> 
> Does it use KNI if it can, for example? We used to have a problem with
> KNI+Athlons, for example. 

Not KNI, I don't think, but 1.2.4 did add support for 3dnow!, with
auto-detection of CPU type. Disabled by default, but available. Are there
any 3dnow! issues??

> It might also be that it's threading-related, and that XMMS is one of
> the few things that uses threads. Things like that. I'm not an XMMS
> user, can somebody who knows XMMS comment on things that it does that
> are unusual?

Always uses threads, can use 3dnow!, DGA and realtime priority. Can also
do direct hardware access to some graphics cards (inc SB16), but I haven't
looked at that one closely.

James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-28 Thread James Sutherland


On 27 Jan 2001, Linus Torvalds wrote:

 In article [EMAIL PROTECTED], David Ford  [EMAIL PROTECTED]
 wrote:
 
 We've narrowed it down to "we're all running xmms" when it happend.
 
 Does anybody have a clue about what is different with xmms?
 
 Does it use KNI if it can, for example? We used to have a problem with
 KNI+Athlons, for example. 

Not KNI, I don't think, but 1.2.4 did add support for 3dnow!, with
auto-detection of CPU type. Disabled by default, but available. Are there
any 3dnow! issues??

 It might also be that it's threading-related, and that XMMS is one of
 the few things that uses threads. Things like that. I'm not an XMMS
 user, can somebody who knows XMMS comment on things that it does that
 are unusual?

Always uses threads, can use 3dnow!, DGA and realtime priority. Can also
do direct hardware access to some graphics cards (inc SB16), but I haven't
looked at that one closely.


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-28 Thread Lorenzo Allegrucci


At 15.40 27/01/01 -0800, you wrote:


On Sat, 27 Jan 2001, Lorenzo Allegrucci wrote:
 
 A trivial "while(1) fork()" is enough to trigger it.
 "mem=32M" by lilo, ulimit -u is 1024.

Hmm.. This does not look like a VM deadlock - it looks like some IO
request is waiting forever on "__get_request_wait()". In fact, it looks
like a _lot_ of people are waiting for requests.

So what happens is that somebody takes a page fault (and gets the mm
lock), tries to read something in, and never gets anything back, thus
leaving the MM locked.

Jens: this looks suspiciously like somebody isn't waking things up when
they add requests back to the request lists. Alternatively, maybe the
unplugging isn't properly done, so that we have a lot of pending IO that
doesn't get started..

Ho humm. Jens: imagine that you have more people waiting for requests than
"batchcount". Further, imagine that you have multiple requests finishing
at the same time. Not unlikely. Now, imagine that one request finishes,
and causes "batchcount" users to wake up, and immediately another request
finishes but THAT one doesn't wake anybody up because it notices that the
freelist isn't empty - so it thinks that it doesn't need to wake anybody.

Lorenzo, does the problem go away for you if you remove the

   if (!list_empty(q-request_freelist[rw])) {
   ...
   }

code from blkdev_release_request() in drivers/block/ll_rw_block.c?

Yes, it does.

--
Lorenzo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-28 Thread Linus Torvalds




On Sun, 28 Jan 2001, Marcelo Tosatti wrote:
 
 Why dont you just put set_page_dirty() back in page_launder() in case
 writepage() fails?

Because a EIO or similar should _not_ be re-tried or kept dirty.

Imagine a bad user that goes over his quota on purpose, and then every
single write will always return an error. What should we do? Let him eat
all physical memory? I don't think so. 

write-out errors will be ignored. We _might_ send a signal or something,
but considering the fact that we don't even know who caused the dirty page
in the first place, even that is kind of hard.

Shared memory and out-of-swap is special - the shared memory code is
supposed to check that we have enough memory before it even allocates
anything.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-28 Thread Linus Torvalds




On Sun, 28 Jan 2001, Jens Axboe wrote:
 
 How about this instead?

I really don't like this one. It will basically re-introduce the old
behaviour of waking people up in a trickle, as far as I can tell. The
reason we want the batching is to make people have more requests to sort
in the elevator, and as far as I can tell this will just hurt that.

Are there any downsides to just _always_ batching, regardless of whether
the request freelist is empty or not? Sure, it will make the "effective"
size of the freelist a bit smaller, but that's probably not actually
noticeable under any load except for the one that empties the freelist (in
which case the old code would have triggered the batching anyway).

Performance numbers?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-28 Thread Jens Axboe


On Sun, Jan 28 2001, Linus Torvalds wrote:
 On Sun, 28 Jan 2001, Jens Axboe wrote:
  
  How about this instead?
 
 I really don't like this one. It will basically re-introduce the old
 behaviour of waking people up in a trickle, as far as I can tell. The
 reason we want the batching is to make people have more requests to sort
 in the elevator, and as far as I can tell this will just hurt that.
 
 Are there any downsides to just _always_ batching, regardless of whether
 the request freelist is empty or not? Sure, it will make the "effective"
 size of the freelist a bit smaller, but that's probably not actually
 noticeable under any load except for the one that empties the freelist (in
 which case the old code would have triggered the batching anyway).

The problem with removing the !list_empty test like you suggested
is that batching is no longer controlled anymore. If we start
batching once the lists are empty and start wakeups once batch_requests
has been reached, we know we'll give the elevator enough to work
with to be effective. With !list_empty removed, batch_requests is no
longer a measure of how many requests we want to batch. Always
batching is not a in problem in itself, the effective smaller freelist
effect should be neglible.

The sent patch will only trickle wakeups in case of batching already
in effect, but batch_request wakeups were not enough to deplete
the freelist again. At least that was the intended effect :-)

 Performance numbers?

Don't have any right now, will test a bit later.

-- 
* Jens Axboe [EMAIL PROTECTED]
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Shawn Starr


Patch appears to work,
for i in [0-9]*; do echo $i; cat $i/stat > /dev/null; done
completes successfully with xmms running in "real-time" priority.

Shawn.

Marcelo Tosatti wrote:

> On Sat, 27 Jan 2001, Linus Torvalds wrote:
>
> >
> >
> > On Sun, 28 Jan 2001, Marcelo Tosatti wrote:
> > > >
> > > > This is the smoking gun here, I bet, but I'd like to make sure I see the
> > > > whole thing. I don't see _why_ we'd have deadlocked on __wait_on_page(),
> > > > but I think this is the thread that hangs on to the mm semaphore.
> > >
> > > I was able to reproduce it here with dbench.
> > >
> > > Nothing is locked except this dbench thread (the only dbench thread):
> > >
> > > dbenchD C1C9FE64  5200  1013  1(L-TLB)1370   785
> > > Call Trace: [___wait_on_page+130/160] [truncate_list_pages+100/404] 
>[truncate_inode_pages+93/128] [iput+162/360] [dput+262/356] [fput+121/232] 
>[exit_mmap+218/292]
> > > [mmput+56/80] [do_exit+208/680] [do_signal+566/656] [dput+25/356] 
>[path_release+13/60] [sys_newstat+100/112] [sys_read+188/196] [signal_return+20/24]
> >
> > Ok, this definitely seems to be the pattern.
> >
> > I don't see _what_ is going on, though.
> >
> > I know of one "known bug" in pre10: if you run out of swap-space with
> > shared memory segments, it will do the wrong thing (return 1 without
> > unlocking the page). xmms might trigger this, but I didn't think that
> > dbench used shared memory?
>
> It does. Bingo.
>
> I'm not able to reproduce the problem here with your patch.
>
> Btw, there is another bug in shm_writepage() where it does not set the
> page dirty in case of failure...
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Marcelo Tosatti



On Sun, 28 Jan 2001, Marcelo Tosatti wrote:

> On Sat, 27 Jan 2001, Linus Torvalds wrote:
> 
> > 
> > 
> > On Sun, 28 Jan 2001, Marcelo Tosatti wrote:
> > > > 
> > > > This is the smoking gun here, I bet, but I'd like to make sure I see the
> > > > whole thing. I don't see _why_ we'd have deadlocked on __wait_on_page(),
> > > > but I think this is the thread that hangs on to the mm semaphore.
> > > 
> > > I was able to reproduce it here with dbench. 
> > > 
> > > Nothing is locked except this dbench thread (the only dbench thread):
> > > 
> > > dbenchD C1C9FE64  5200  1013  1(L-TLB)1370   785 
> > > Call Trace: [___wait_on_page+130/160] [truncate_list_pages+100/404] 
>[truncate_inode_pages+93/128] [iput+162/360] [dput+262/356] [fput+121/232] 
>[exit_mmap+218/292]  
> > > [mmput+56/80] [do_exit+208/680] [do_signal+566/656] [dput+25/356] 
>[path_release+13/60] [sys_newstat+100/112] [sys_read+188/196] [signal_return+20/24]  
> > 
> > Ok, this definitely seems to be the pattern.
> > 
> > I don't see _what_ is going on, though.
> > 
> > I know of one "known bug" in pre10: if you run out of swap-space with
> > shared memory segments, it will do the wrong thing (return 1 without
> > unlocking the page). xmms might trigger this, but I didn't think that
> > dbench used shared memory?
> 
> It does. Bingo.
> 
> I'm not able to reproduce the problem here with your patch. 
> 
> Btw, there is another bug in shm_writepage() where it does not set the
> page dirty in case of failure...

Why dont you just put set_page_dirty() back in page_launder() in case
writepage() fails?

Otherwise you'll have to do in every specific implementation of
writepage(). 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Marcelo Tosatti




On Sat, 27 Jan 2001, Linus Torvalds wrote:

> 
> 
> On Sun, 28 Jan 2001, Marcelo Tosatti wrote:
> > > 
> > > This is the smoking gun here, I bet, but I'd like to make sure I see the
> > > whole thing. I don't see _why_ we'd have deadlocked on __wait_on_page(),
> > > but I think this is the thread that hangs on to the mm semaphore.
> > 
> > I was able to reproduce it here with dbench. 
> > 
> > Nothing is locked except this dbench thread (the only dbench thread):
> > 
> > dbenchD C1C9FE64  5200  1013  1(L-TLB)1370   785 
> > Call Trace: [___wait_on_page+130/160] [truncate_list_pages+100/404] 
>[truncate_inode_pages+93/128] [iput+162/360] [dput+262/356] [fput+121/232] 
>[exit_mmap+218/292]  
> > [mmput+56/80] [do_exit+208/680] [do_signal+566/656] [dput+25/356] 
>[path_release+13/60] [sys_newstat+100/112] [sys_read+188/196] [signal_return+20/24]  
> 
> Ok, this definitely seems to be the pattern.
> 
> I don't see _what_ is going on, though.
> 
> I know of one "known bug" in pre10: if you run out of swap-space with
> shared memory segments, it will do the wrong thing (return 1 without
> unlocking the page). xmms might trigger this, but I didn't think that
> dbench used shared memory?

It does. Bingo.

I'm not able to reproduce the problem here with your patch. 

Btw, there is another bug in shm_writepage() where it does not set the
page dirty in case of failure...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Linus Torvalds




On Sun, 28 Jan 2001, Marcelo Tosatti wrote:
> > 
> > This is the smoking gun here, I bet, but I'd like to make sure I see the
> > whole thing. I don't see _why_ we'd have deadlocked on __wait_on_page(),
> > but I think this is the thread that hangs on to the mm semaphore.
> 
> I was able to reproduce it here with dbench. 
> 
> Nothing is locked except this dbench thread (the only dbench thread):
> 
> dbenchD C1C9FE64  5200  1013  1(L-TLB)1370   785 
> Call Trace: [___wait_on_page+130/160] [truncate_list_pages+100/404] 
>[truncate_inode_pages+93/128] [iput+162/360] [dput+262/356] [fput+121/232] 
>[exit_mmap+218/292]  
> [mmput+56/80] [do_exit+208/680] [do_signal+566/656] [dput+25/356] 
>[path_release+13/60] [sys_newstat+100/112] [sys_read+188/196] [signal_return+20/24]  

Ok, this definitely seems to be the pattern.

I don't see _what_ is going on, though.

I know of one "known bug" in pre10: if you run out of swap-space with
shared memory segments, it will do the wrong thing (return 1 without
unlocking the page). xmms might trigger this, but I didn't think that
dbench used shared memory?

There's also an ugliness in the truncate ordering. I don't think it should
matter, but I do believe it's conceptually wrong as-is.

Does this patch make any difference at all?

Linus

-
diff -u --recursive --new-file pre10/linux/mm/memory.c linux/mm/memory.c
--- pre10/linux/mm/memory.c Sat Jan 27 10:53:39 2001
+++ linux/mm/memory.c   Sat Jan 27 19:12:35 2001
@@ -945,7 +945,6 @@
if (inode->i_size < offset)
goto do_expand;
inode->i_size = offset;
-   truncate_inode_pages(mapping, offset);
spin_lock(>i_shared_lock);
if (!mapping->i_mmap && !mapping->i_mmap_shared)
goto out_unlock;
@@ -960,8 +959,7 @@
 
 out_unlock:
spin_unlock(>i_shared_lock);
-   /* this should go into ->truncate */
-   inode->i_size = offset;
+   truncate_inode_pages(mapping, offset);
if (inode->i_op && inode->i_op->truncate)
inode->i_op->truncate(inode);
return;
diff -u --recursive --new-file pre10/linux/mm/shmem.c linux/mm/shmem.c
--- pre10/linux/mm/shmem.c  Sat Jan 27 10:53:39 2001
+++ linux/mm/shmem.cSat Jan 27 19:50:08 2001
@@ -217,8 +217,11 @@
 
info = >mapping->host->u.shmem_i;
swap = __get_swap_page(2);
-   if (!swap.val)
-   return 1;
+   if (!swap.val) {
+   set_page_dirty(page);
+   UnlockPage(page);
+   return -ENOMEM;
+   }
 
spin_lock(>lock);
shmem_recalc_inode(page->mapping->host);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Marcelo Tosatti



(ugh, sorry about last mail)

On 27 Jan 2001, Linus Torvalds wrote:

> In article <[EMAIL PROTECTED]>, David Ford  <[EMAIL PROTECTED]> wrote:
> >Unfortunately klogd reads /procerg.
> >
> >So the following is a painstakingly slow hand translation, I'll only print
> >the D state entries unless someone asks otherwise.
> 
> You seem to be pretty much able to reproduce this at will, right?
> 
> I'd really like to see the raw System.map and dmesg output if your
> syslogd doesn't do a proper job of getting the symbols interpreted: just
> send the things by email, and I'll put something together.  It's too
> hard to interpret your half-way decoded thing, and I really want to see
> what this xmms thing is doing.. 
> 
> >xmms  D CACC5EA8  4116   713155   715  (NOTLB)1493   674
> >Call Trace: [] [] [] [] []
> >[] []
> >   [] [] [] [] []
> >
> >c01248e4 T ___wait_on_page
> >c0124984 t __lock_page
> >
> >c01240dc t truncate_list_pages
> >c0124268 T truncate_inode_pages
> >c01242d4 t writeout_one_page
> 
> This is the smoking gun here, I bet, but I'd like to make sure I see the
> whole thing. I don't see _why_ we'd have deadlocked on __wait_on_page(),
> but I think this is the thread that hangs on to the mm semaphore.

I was able to reproduce it here with dbench. 

Nothing is locked except this dbench thread (the only dbench thread):

dbenchD C1C9FE64  5200  1013  1(L-TLB)1370   785 
Call Trace: [___wait_on_page+130/160] [truncate_list_pages+100/404] 
[truncate_inode_pages+93/128] [iput+162/360] [dput+262/356] [fput+121/232] 
[exit_mmap+218/292]  
[mmput+56/80] [do_exit+208/680] [do_signal+566/656] [dput+25/356] [path_release+13/60] 
[sys_newstat+100/112] [sys_read+188/196] [signal_return+20/24]  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-27 Thread Marcelo Tosatti



On Sun, 28 Jan 2001, Jens Axboe wrote:

> On Sat, Jan 27 2001, Linus Torvalds wrote:
> > > What was the trace of this? Just curious, the below case outlined by
> > > Linus should be pretty generic, but I'd still like to know what
> > > can lead to this condition.
> > 
> > It was posted on linux-kernel - I don't save the dang things because I
> > have too much in my "archives" as is ;)
> 
> Ok I see it now, confused wrt the different threads...
> 
> > > Good spotting. Actually I see one more problem with it too. If
> > > we've started batching (under heavy I/O of course), we could
> > > splice the pending list and wake up X number of sleepers, but
> > > there's a) no guarentee that these sleepers will actually get
> > > the requests if new ones keep flooding in
> > 
> > (a) is ok. They'll go back to sleep - it's a loop waiting for requests..
> 
> My point is not that it's broken, but it will favor new comers
> instead of tasks having blocked on a free slot already. So it
> would still be nice to get right.
> 
> > >and b) no guarentee
> > > that X sleepers require X request slots.
> > 
> > Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
> > them), they _will_ use a request. I don't think we have to worry about
> > that. At most we will wake up "too many" - we'll wake up processes even
> > though they end up not being able to get a request anyway because somebody
> > else got to it first. And that's ok. It's the "wake up too few" that
> > causes trouble, and I think that will be fixed by my suggestion.
> 
> Yes they may end up sleeing right away again as per the above a) case
> for instance. The logic now is 'we have X free slots now, wake up
> x sleepers' where it instead should be 'we have X free slots now,
> wake up people until the free list is exhausted'.
> 
> > Now, I'd worred if somebody wants several requests at the same time, and
> > doesn't feed them to the IO layer until it has gotten all of them. In that
> > case, you can get starvation with many people having "reserved" their
> > requests, and there not be enough free requests around to actually ever
> > wake anybody up again. But the regular IO paths do not do this: they will
> > all allocate a request and just submit it immediately, no "reservation".
> 
> Right, the I/O path doesn't do this and it would seem more appropriate
> to have such users use their own requests instead of eating from
> the internal pool.
> 
> -- 
> * Jens Axboe <[EMAIL PROTECTED]>
> * SuSE Labs
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Håvard Kvålen


> Does anybody have a clue about what is different with xmms?
> 
> Does it use KNI if it can, for example? We used to have a problem
> with KNI+Athlons, for example.

No, it doesn't.

> It might also be that it's threading-related, and that XMMS is one
> of the few things that uses threads. Things like that. I'm not an
> XMMS user, can somebody who knows XMMS comment on things that it
> does that are unusual?

Yes, threads could be the thing that makes a difference.  I can't
think of anything else that is special about XMMS. 

-- 
Håvard Kvålen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-27 Thread Jens Axboe


On Sat, Jan 27 2001, Linus Torvalds wrote:
> > What was the trace of this? Just curious, the below case outlined by
> > Linus should be pretty generic, but I'd still like to know what
> > can lead to this condition.
> 
> It was posted on linux-kernel - I don't save the dang things because I
> have too much in my "archives" as is ;)

Ok I see it now, confused wrt the different threads...

> > Good spotting. Actually I see one more problem with it too. If
> > we've started batching (under heavy I/O of course), we could
> > splice the pending list and wake up X number of sleepers, but
> > there's a) no guarentee that these sleepers will actually get
> > the requests if new ones keep flooding in
> 
> (a) is ok. They'll go back to sleep - it's a loop waiting for requests..

My point is not that it's broken, but it will favor new comers
instead of tasks having blocked on a free slot already. So it
would still be nice to get right.

> >  and b) no guarentee
> > that X sleepers require X request slots.
> 
> Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
> them), they _will_ use a request. I don't think we have to worry about
> that. At most we will wake up "too many" - we'll wake up processes even
> though they end up not being able to get a request anyway because somebody
> else got to it first. And that's ok. It's the "wake up too few" that
> causes trouble, and I think that will be fixed by my suggestion.

Yes they may end up sleeing right away again as per the above a) case
for instance. The logic now is 'we have X free slots now, wake up
x sleepers' where it instead should be 'we have X free slots now,
wake up people until the free list is exhausted'.

> Now, I'd worred if somebody wants several requests at the same time, and
> doesn't feed them to the IO layer until it has gotten all of them. In that
> case, you can get starvation with many people having "reserved" their
> requests, and there not be enough free requests around to actually ever
> wake anybody up again. But the regular IO paths do not do this: they will
> all allocate a request and just submit it immediately, no "reservation".

Right, the I/O path doesn't do this and it would seem more appropriate
to have such users use their own requests instead of eating from
the internal pool.

-- 
* Jens Axboe <[EMAIL PROTECTED]>
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-27 Thread Linus Torvalds

On Sun, 28 Jan 2001, Jens Axboe wrote:
> > 
> > So what happens is that somebody takes a page fault (and gets the mm
> > lock), tries to read something in, and never gets anything back, thus
> > leaving the MM locked.
> 
> What was the trace of this? Just curious, the below case outlined by
> Linus should be pretty generic, but I'd still like to know what
> can lead to this condition.

It was posted on linux-kernel - I don't save the dang things because I
have too much in my "archives" as is ;)

> > Lorenzo, does the problem go away for you if you remove the
> > 
> > if (!list_empty(>request_freelist[rw])) {
> > ...
> > }
> > 
> > code from blkdev_release_request() in drivers/block/ll_rw_block.c?
> 
> Good spotting. Actually I see one more problem with it too. If
> we've started batching (under heavy I/O of course), we could
> splice the pending list and wake up X number of sleepers, but
> there's a) no guarentee that these sleepers will actually get
> the requests if new ones keep flooding in

(a) is ok. They'll go back to sleep - it's a loop waiting for requests..

>and b) no guarentee
> that X sleepers require X request slots.

Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
them), they _will_ use a request. I don't think we have to worry about
that. At most we will wake up "too many" - we'll wake up processes even
though they end up not being able to get a request anyway because somebody
else got to it first. And that's ok. It's the "wake up too few" that
causes trouble, and I think that will be fixed by my suggestion.

Now, I'd worred if somebody wants several requests at the same time, and
doesn't feed them to the IO layer until it has gotten all of them. In that
case, you can get starvation with many people having "reserved" their
requests, and there not be enough free requests around to actually ever
wake anybody up again. But the regular IO paths do not do this: they will
all allocate a request and just submit it immediately, no "reservation".

(Obviously, _submitting_ the request doesn't mean that we'd actually start
processing it, but if somebody ends up waiting for requests they'll do the
unplug that does start it all going, so effectively we can think of it as
a logical "start this request now" thing even if it gets delayed in order
to coalesce IO).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>, David Ford  <[EMAIL PROTECTED]> wrote:
>Unfortunately klogd reads /procerg.
>
>So the following is a painstakingly slow hand translation, I'll only print
>the D state entries unless someone asks otherwise.

You seem to be pretty much able to reproduce this at will, right?

I'd really like to see the raw System.map and dmesg output if your
syslogd doesn't do a proper job of getting the symbols interpreted: just
send the things by email, and I'll put something together.  It's too
hard to interpret your half-way decoded thing, and I really want to see
what this xmms thing is doing.. 

>xmms  D CACC5EA8  4116   713155   715  (NOTLB)1493   674
>Call Trace: [] [] [] [] []
>[] []
>   [] [] [] [] []
>
>c01248e4 T ___wait_on_page
>c0124984 t __lock_page
>
>c01240dc t truncate_list_pages
>c0124268 T truncate_inode_pages
>c01242d4 t writeout_one_page

This is the smoking gun here, I bet, but I'd like to make sure I see the
whole thing. I don't see _why_ we'd have deadlocked on __wait_on_page(),
but I think this is the thread that hangs on to the mm semaphore.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Shawn Starr


yes, I should also mention I have also a SoundBlaster 32AWE (0MB on the daughterboard).

J Sloan wrote:

> OK, here's the details you asked about:
>
> Soundblaster Awe 32 sound card
> Voodoo 3 pci video card
> Running Xfree86-4.0.0 (rpms from 3dfx.com)
> Playing unreal tournament, no special game
> options, just 800x600 graphics @ 16 bits.
>
> To recap, the symptoms (hung ps, etc) occurred
> on kernel 2.4.1-pre8 + low latency patches. (but
> I don't think the low latency patches had anything
> to do with it, based on the other reports)
>
> Hope this helps
>
> jjs
>
> David Ford wrote:
>
> > On 2.4.0-ac12, I played music for about 30 minutes without any problems.  I 
>started up an mpeg in xmms and it
> > locked in short order.  I'm sure now that it has something to do with the 
>graphics.  What DGA or other config
> > options do you have enabled for your game?
> >
> > What video and sound card?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread J Sloan

OK, here's the details you asked about:

Soundblaster Awe 32 sound card
Voodoo 3 pci video card
Running Xfree86-4.0.0 (rpms from 3dfx.com)
Playing unreal tournament, no special game
options, just 800x600 graphics @ 16 bits.

To recap, the symptoms (hung ps, etc) occurred
on kernel 2.4.1-pre8 + low latency patches. (but
I don't think the low latency patches had anything
to do with it, based on the other reports)

Hope this helps

jjs

David Ford wrote:

> On 2.4.0-ac12, I played music for about 30 minutes without any problems.  I started 
>up an mpeg in xmms and it
> locked in short order.  I'm sure now that it has something to do with the graphics.  
>What DGA or other config
> options do you have enabled for your game?
>
> What video and sound card?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread David Ford


It is important to note that when I hit the magic key and rebooted (SUB), a
split second before it rebooted, a stalled 'lspci' snapped back to life and
printed out my expected data.

-d

--
  There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
  The good thing about standards is that there are so many to choose from. Andrew S. 
Tanenbaum



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread David Ford


On 2.4.0-ac12, I played music for about 30 minutes without any problems.  I started up 
an mpeg in xmms and it
locked in short order.  I'm sure now that it has something to do with the graphics.  
What DGA or other config
options do you have enabled for your game?

What video and sound card?

I have an ATI Rage LT Pro AGP-133 according to lspci.

-d

J Sloan wrote:

> Sorry, there was no xmms involved here -
>
> The behavior occurred while playing unreal tournament.
>
> But at least the sound card was in use, FWIW -
>
> jjs
>
> David Ford wrote:
>
> > We've narrowed it down to "we're all running xmms" when it happend.
> >
> > -d
> >
> > J Sloan wrote:
> >
> > > Just for the record, the system where I saw the problem
> > > has only ext2 -
> >
> > --
> >   There is a natural aristocracy among men. The grounds of this are virtue and 
>talents. Thomas Jefferson
> >   The good thing about standards is that there are so many to choose from. Andrew 
>S. Tanenbaum
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [EMAIL PROTECTED]
> > Please read the FAQ at http://www.tux.org/lkml/

--
  There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
  The good thing about standards is that there are so many to choose from. Andrew S. 
Tanenbaum



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread David Ford


Unfortunately klogd reads /procerg.

So the following is a painstakingly slow hand translation, I'll only print
the D state entries unless someone asks otherwise.

Prior to this:
XMMS is running playing star wars mpeg. (regular user) (frozen)
TOP is running (regular user) (frozen)
while [ 1 ]; do ls -laR /proc ; done (regular user) (frozen)
skill -9 xmms (root) (frozen)
X 4.0.2 running, scp of 600meg file over pegasus usb ethernet (10Mbit).

syslog caught:
Jan 27 16:42:26 nifty kernel: SysRq: Show State
Jan 27 16:42:26 nifty kernel:
Jan 27 16:42:26 nifty kernel:
freesibling
Jan 27 16:42:26 nifty kernel:   task PCstack   pid father
child younger older
Jan 27 16:42:26 nifty kernel: init  S CBFEBF2C  3184 1  0   187
(NOTLB)


dmesg shows (only D state for brevity):
top   D CA98B3DC  4440   219158(NOTLB)
Call Trace: [] [] [] [] []
[] []

c01078c8 T __down
c0107964 T __down_interruptible
c0107a28 T __down_trylock
c0107a60 T __down_failed
c0107a6c T __down_failed_interruptible

c02f6a00 T stext_lock
c02f827e A _etext

c014b578 t proc_info_read
c014b688 t mem_read

c0131150 T sys_read
c013121c T sys_write

c0108d2c T system_call
c0108d64 T ret_from_sys_call

c010 t startup_32
c0100139 t is486


xmms  D CACC5EA8  4116   713155   715  (NOTLB)1493   674
Call Trace: [] [] [] [] []
[] []
   [] [] [] [] []

c01248e4 T ___wait_on_page
c0124984 t __lock_page

c01240dc t truncate_list_pages
c0124268 T truncate_inode_pages
c01242d4 t writeout_one_page

c0144094 T remove_inode_hash
c01440a8 T iput
c01441fc T force_delete

c01422a0 T dput
c01423e4 T d_invalidate

c0131c58 T fput
c0131d28 T fget

c012365c t unmap_fixup
c0123788 t free_pgtables

c012380c T do_munmap
c0123a5c T sys_munmap

...ask if you want more

xmms  S C2979F30 0   715713   725  (NOTLB)
Call Trace: [] [] [] [] []
[]
xmms  S C2B75F2C  1156   716715(NOTLB) 718
Call Trace: [] [] [] [] []
xmms  S 7FFF 0   718715(NOTLB) 719   716
Call Trace: [] [] [] []
xmms  S C2975F88   832   719715(NOTLB) 725   718
Call Trace: [] [] [] [] []
xmms  S CA8D7F88  2672   725715(NOTLB)   719
Call Trace: [] [] [] []

c0114240 t process_timeout
c0114288 T schedule_timeout
c011431c T schedule_tail

c0113d70 t remap_area_pages
c0114020 T __ioremap

c0108d2c T system_call
c0108d64 T ret_from_sys_call


lsD CA98B3DC 0  1896222(NOTLB)
Call Trace: [] [] [] [] []
[]
skill D CA98B3DC 0  1897187(NOTLB)
Call Trace: [] [] [] [] []
[]

c0107964 T __down_interruptible
c0107a28 T __down_trylock
c0107a60 T __down_failed
c0107a6c T __down_failed_interruptible

c02f6a00 T stext_lock
c02f827e A _etext
 ...


SysRq: Show Memory
Mem-info:
Free pages:2240kB ( 0kB HighMem)
( Active: 4153, inactive_dirty: 198, inactive_clean: 1077, free: 560 (383 766
1149) )
31*4kB 1*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB =
660kB)
125*4kB 5*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB
= 1580kB)
= 0kB)
Swap cache: add 3165, delete 547, find 25/124
Free swap:53104kB
49136 pages of RAM
0 pages of HIGHMEM
1798 reserved pages
2619 pages shared
2618 pages swap cached
0 pages in page table cache
Buffer memory: 1276kB

-d

--
  There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
  The good thing about standards is that there are so many to choose from. Andrew S. 
Tanenbaum



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Aaron Lehmann


On Sat, Jan 27, 2001 at 04:42:45PM -0800, J Sloan wrote:
> But at least the sound card was in use, FWIW -

Not for me. My xmms was sitting idle when it froze.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread J Sloan


Sorry, there was no xmms involved here -

The behavior occurred while playing unreal tournament.

But at least the sound card was in use, FWIW -

jjs

David Ford wrote:

> We've narrowed it down to "we're all running xmms" when it happend.
>
> -d
>
> J Sloan wrote:
>
> > Just for the record, the system where I saw the problem
> > has only ext2 -
>
> --
>   There is a natural aristocracy among men. The grounds of this are virtue and 
>talents. Thomas Jefferson
>   The good thing about standards is that there are so many to choose from. Andrew S. 
>Tanenbaum
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread David Ford

Linus Torvalds wrote:

> In article <[EMAIL PROTECTED]>, David Ford  <[EMAIL PROTECTED]>
> wrote:
> >
> >We've narrowed it down to "we're all running xmms" when it happend.
>
> Does anybody have a clue about what is different with xmms?

Not sure.

> Does it use KNI if it can, for example? We used to have a problem with
> KNI+Athlons, for example.
>
> It might also be that it's threading-related, and that XMMS is one of
> the few things that uses threads. Things like that. I'm not an XMMS
> user, can somebody who knows XMMS comment on things that it does that
> are unusual?

If I was clued enough to know KNI, I could say for a certainty.  I am
assuming it's a form of MMX or related.  My notebook is a mobile pII 366.

I'm stress testing it now with ac12.  I originally had pre9 on it.  There is
one difference other than that, I have Marcelo's bg aging patch on here which
seems to have improved responsiveness significantly but I'll save that for
another story.

I've triggered it, report follows in next email.

-d

--
  There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
  The good thing about standards is that there are so many to choose from. Andrew S. 
Tanenbaum

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Shawn Starr


This system is the following:

AcerOPEN AP53/AX Motherboard, Intel Pentium 200Mhz w/o MMX (1996-1997)
Chipsets: 430HX, PIIX3 (EIDE)

64MB RAM EDO 60ns (Kingston brand)


Linus Torvalds wrote:

> In article <[EMAIL PROTECTED]>, David Ford  <[EMAIL PROTECTED]>
> wrote:
> >
> >We've narrowed it down to "we're all running xmms" when it happend.
>
> Does anybody have a clue about what is different with xmms?
>
> Does it use KNI if it can, for example? We used to have a problem with
> KNI+Athlons, for example.
>
> It might also be that it's threading-related, and that XMMS is one of
> the few things that uses threads. Things like that. I'm not an XMMS
> user, can somebody who knows XMMS comment on things that it does that
> are unusual?
>
> Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>, David Ford  <[EMAIL PROTECTED]>
wrote:
>
>We've narrowed it down to "we're all running xmms" when it happend.

Does anybody have a clue about what is different with xmms?

Does it use KNI if it can, for example? We used to have a problem with
KNI+Athlons, for example. 

It might also be that it's threading-related, and that XMMS is one of
the few things that uses threads. Things like that. I'm not an XMMS
user, can somebody who knows XMMS comment on things that it does that
are unusual?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread David Ford


At the time I had temporary access to my notebook and had a mismatched System.map
file :S

-d

Linus Torvalds wrote:

> In article <[EMAIL PROTECTED]>, David Ford  <[EMAIL PROTECTED]> wrote:
> >I can quickly and easily duplicate it on my notebook by playing music or
> >mpegs in xmms.  It may take a few minutes but it's guaranteed.
> >
> >xmms stalls flat on it's face and anything accessing /proc stalls.  If I get
> >the time to do it, I'll take a gander at it with kdb.
>
> Please, if you see something like this, just do a simple
>  followed by  while in text-mode. The
> magic keystrokes will give a stack trace of the currently running
> process and all processes respectively.

--
  There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
  The good thing about standards is that there are so many to choose from. Andrew S. 
Tanenbaum



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread David Ford


We've narrowed it down to "we're all running xmms" when it happend.

-d

J Sloan wrote:

> Just for the record, the system where I saw the problem
> has only ext2 -

--
  There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
  The good thing about standards is that there are so many to choose from. Andrew S. 
Tanenbaum



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Aaron Lehmann


On Sat, Jan 27, 2001 at 04:33:42AM -0500, Shawn Starr wrote:
> Yes, I have ReiserFS as well...hrm...

I don't.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread J Sloan


Just for the record, the system where I saw the problem
has only ext2 -

jjs

Shawn Starr wrote:

> Yes, I have ReiserFS as well...hrm...
>
> David Ford wrote:
>
> > I can quickly and easily duplicate it on my notebook by playing music or
> > mpegs in xmms.  It may take a few minutes but it's guaranteed.
> >
> > xmms stalls flat on it's face and anything accessing /proc stalls.  If I get
> > the time to do it, I'll take a gander at it with kdb.
> >
> > I have no patches applied to p10, I have reiserfs onboard but I highly doubt
> > it's reiserfs.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>, David Ford  <[EMAIL PROTECTED]> wrote:
>I can quickly and easily duplicate it on my notebook by playing music or
>mpegs in xmms.  It may take a few minutes but it's guaranteed.
>
>xmms stalls flat on it's face and anything accessing /proc stalls.  If I get
>the time to do it, I'll take a gander at it with kdb.

Please, if you see something like this, just do a simple
 followed by  while in text-mode. The
magic keystrokes will give a stack trace of the currently running
process and all processes respectively.

Then, just look in your /var/log/messages, and if you have everything
set up correctly the system should have done the conversion to symbolic
kernel addresses for you - so you can see directly where the different
processes are sleeping.

Sanity-check that your System.map information (and thus the symbolic
conversion) ooks to be ok: the processes that hang should show up in the
trace as being in __down_failed() or something like that. Tha only
reason for a hang with /proc// tends to be that some process would
have deadlocked on it's MM semaphore or is somehow stuck inside it's
critical region on something else.

Finally, try to pinpoint _which_ process it is. Usully most easily done
by simply seeing where it is that the /proc accesses get stuck, with
something simple like

cd /proc
for i in [0-9]*; do
  echo $i
  cat $i/stat > /dev/null
done

and see what the last pid it printed out was (not that the above
guarantees that you found the thing, because there might be several
things. But it's one more piece to the puzzle).

And send the information to the kernel mailing list, along with anything
else you might think of.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread John Sheahan


I have not compiled or used reiserfs here yet.
compiling Mikes semaphore debug patch now and adding sysrq
- but this took three days to happen just once here.
..john


Shawn Starr wrote:
> 
> Yes, I have ReiserFS as well...hrm...
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Shawn Starr


Yes, I have ReiserFS as well...hrm...

David Ford wrote:

> I can quickly and easily duplicate it on my notebook by playing music or
> mpegs in xmms.  It may take a few minutes but it's guaranteed.
>
> xmms stalls flat on it's face and anything accessing /proc stalls.  If I get
> the time to do it, I'll take a gander at it with kdb.
>
> I have no patches applied to p10, I have reiserfs onboard but I highly doubt
> it's reiserfs.
>
> -d
>
> J Sloan wrote:
>
> > OK, It's official now, I didn't know if it was some
> > weird hardware fluke or something, but one of
> > the computers here exhibited the same problem -
>
> --
>   There is a natural aristocracy among men. The grounds of this are virtue and 
>talents. Thomas Jefferson
>   The good thing about standards is that there are so many to choose from. Andrew S. 
>Tanenbaum
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread David Ford

I can quickly and easily duplicate it on my notebook by playing music or
mpegs in xmms.  It may take a few minutes but it's guaranteed.

xmms stalls flat on it's face and anything accessing /proc stalls.  If I get
the time to do it, I'll take a gander at it with kdb.

I have no patches applied to p10, I have reiserfs onboard but I highly doubt
it's reiserfs.

-d

J Sloan wrote:

> OK, It's official now, I didn't know if it was some
> weird hardware fluke or something, but one of
> the computers here exhibited the same problem -

--
  There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
  The good thing about standards is that there are so many to choose from. Andrew S. 
Tanenbaum

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread David Ford


I can quickly and easily duplicate it on my notebook by playing music or
mpegs in xmms.  It may take a few minutes but it's guaranteed.

xmms stalls flat on it's face and anything accessing /proc stalls.  If I get
the time to do it, I'll take a gander at it with kdb.

I have no patches applied to p10, I have reiserfs onboard but I highly doubt
it's reiserfs.

-d

J Sloan wrote:

 OK, It's official now, I didn't know if it was some
 weird hardware fluke or something, but one of
 the computers here exhibited the same problem -

--
  There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
  The good thing about standards is that there are so many to choose from. Andrew S. 
Tanenbaum



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Shawn Starr


Yes, I have ReiserFS as well...hrm...

David Ford wrote:

 I can quickly and easily duplicate it on my notebook by playing music or
 mpegs in xmms.  It may take a few minutes but it's guaranteed.

 xmms stalls flat on it's face and anything accessing /proc stalls.  If I get
 the time to do it, I'll take a gander at it with kdb.

 I have no patches applied to p10, I have reiserfs onboard but I highly doubt
 it's reiserfs.

 -d

 J Sloan wrote:

  OK, It's official now, I didn't know if it was some
  weird hardware fluke or something, but one of
  the computers here exhibited the same problem -

 --
   There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
   The good thing about standards is that there are so many to choose from. Andrew S. 
Tanenbaum

 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread John Sheahan


I have not compiled or used reiserfs here yet.
compiling Mikes semaphore debug patch now and adding sysrq
- but this took three days to happen just once here.
..john


Shawn Starr wrote:
 
 Yes, I have ReiserFS as well...hrm...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread J Sloan


Just for the record, the system where I saw the problem
has only ext2 -

jjs

Shawn Starr wrote:

 Yes, I have ReiserFS as well...hrm...

 David Ford wrote:

  I can quickly and easily duplicate it on my notebook by playing music or
  mpegs in xmms.  It may take a few minutes but it's guaranteed.
 
  xmms stalls flat on it's face and anything accessing /proc stalls.  If I get
  the time to do it, I'll take a gander at it with kdb.
 
  I have no patches applied to p10, I have reiserfs onboard but I highly doubt
  it's reiserfs.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Aaron Lehmann


On Sat, Jan 27, 2001 at 04:33:42AM -0500, Shawn Starr wrote:
 Yes, I have ReiserFS as well...hrm...

I don't.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread David Ford


At the time I had temporary access to my notebook and had a mismatched System.map
file :S

-d

Linus Torvalds wrote:

 In article [EMAIL PROTECTED], David Ford  [EMAIL PROTECTED] wrote:
 I can quickly and easily duplicate it on my notebook by playing music or
 mpegs in xmms.  It may take a few minutes but it's guaranteed.
 
 xmms stalls flat on it's face and anything accessing /proc stalls.  If I get
 the time to do it, I'll take a gander at it with kdb.

 Please, if you see something like this, just do a simple
 Alt+ScrollLock followed by Ctrl+ScrollLock while in text-mode. The
 magic keystrokes will give a stack trace of the currently running
 process and all processes respectively.

--
  There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
  The good thing about standards is that there are so many to choose from. Andrew S. 
Tanenbaum



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Shawn Starr


This system is the following:

AcerOPEN AP53/AX Motherboard, Intel Pentium 200Mhz w/o MMX (1996-1997)
Chipsets: 430HX, PIIX3 (EIDE)

64MB RAM EDO 60ns (Kingston brand)


Linus Torvalds wrote:

 In article [EMAIL PROTECTED], David Ford  [EMAIL PROTECTED]
 wrote:
 
 We've narrowed it down to "we're all running xmms" when it happend.

 Does anybody have a clue about what is different with xmms?

 Does it use KNI if it can, for example? We used to have a problem with
 KNI+Athlons, for example.

 It might also be that it's threading-related, and that XMMS is one of
 the few things that uses threads. Things like that. I'm not an XMMS
 user, can somebody who knows XMMS comment on things that it does that
 are unusual?

 Linus
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread J Sloan


Sorry, there was no xmms involved here -

The behavior occurred while playing unreal tournament.

But at least the sound card was in use, FWIW -

jjs

David Ford wrote:

 We've narrowed it down to "we're all running xmms" when it happend.

 -d

 J Sloan wrote:

  Just for the record, the system where I saw the problem
  has only ext2 -

 --
   There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
   The good thing about standards is that there are so many to choose from. Andrew S. 
Tanenbaum

 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Aaron Lehmann


On Sat, Jan 27, 2001 at 04:42:45PM -0800, J Sloan wrote:
 But at least the sound card was in use, FWIW -

Not for me. My xmms was sitting idle when it froze.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread David Ford


Unfortunately klogd reads /procerg.

So the following is a painstakingly slow hand translation, I'll only print
the D state entries unless someone asks otherwise.

Prior to this:
XMMS is running playing star wars mpeg. (regular user) (frozen)
TOP is running (regular user) (frozen)
while [ 1 ]; do ls -laR /proc ; done (regular user) (frozen)
skill -9 xmms (root) (frozen)
X 4.0.2 running, scp of 600meg file over pegasus usb ethernet (10Mbit).

syslog caught:
Jan 27 16:42:26 nifty kernel: SysRq: Show State
Jan 27 16:42:26 nifty kernel:
Jan 27 16:42:26 nifty kernel:
freesibling
Jan 27 16:42:26 nifty kernel:   task PCstack   pid father
child younger older
Jan 27 16:42:26 nifty kernel: init  S CBFEBF2C  3184 1  0   187
(NOTLB)
end

dmesg shows (only D state for brevity):
top   D CA98B3DC  4440   219158(NOTLB)
Call Trace: [c010791d] [c0107a68] [c02f73dd] [c014b5cb] [c01311e6]
[c0108d5f] [c010002b]

c01078c8 T __down
c0107964 T __down_interruptible
c0107a28 T __down_trylock
c0107a60 T __down_failed
c0107a6c T __down_failed_interruptible

c02f6a00 T stext_lock
c02f827e A _etext

c014b578 t proc_info_read
c014b688 t mem_read

c0131150 T sys_read
c013121c T sys_write

c0108d2c T system_call
c0108d64 T ret_from_sys_call

c010 t startup_32
c0100139 t is486


xmms  D CACC5EA8  4116   713155   715  (NOTLB)1493   674
Call Trace: [c0124966] [c012412f] [c01242b8] [c0144138] [c014238e]
[c0131cd0] [c01236b2]
   [c01239f2] [c01ac5ca] [c010d1f6] [c0108e7c] [c0108d5f]

c01248e4 T ___wait_on_page
c0124984 t __lock_page

c01240dc t truncate_list_pages
c0124268 T truncate_inode_pages
c01242d4 t writeout_one_page

c0144094 T remove_inode_hash
c01440a8 T iput
c01441fc T force_delete

c01422a0 T dput
c01423e4 T d_invalidate

c0131c58 T fput
c0131d28 T fget

c012365c t unmap_fixup
c0123788 t free_pgtables

c012380c T do_munmap
c0123a5c T sys_munmap

...ask if you want more

xmms  S C2979F30 0   715713   725  (NOTLB)
Call Trace: [c01142fb] [c0114240] [c013f95e] [c013fb53] [c0119fff]
[c0108d5f]
xmms  S C2B75F2C  1156   716715(NOTLB) 718
Call Trace: [c01142fb] [c0114240] [c013f341] [c013f6e0] [c0108d5f]
xmms  S 7FFF 0   718715(NOTLB) 719   716
Call Trace: [c011429f] [c013f341] [c013f6e0] [c0108d5f]
xmms  S C2975F88   832   719715(NOTLB) 725   718
Call Trace: [c01142fb] [c0114240] [c011d468] [c0108d5f] [c010002b]
xmms  S CA8D7F88  2672   725715(NOTLB)   719
Call Trace: [c01142fb] [c0114240] [c011d468] [c0108d5f]

c0114240 t process_timeout
c0114288 T schedule_timeout
c011431c T schedule_tail

c0113d70 t remap_area_pages
c0114020 T __ioremap

c0108d2c T system_call
c0108d64 T ret_from_sys_call


lsD CA98B3DC 0  1896222(NOTLB)
Call Trace: [c010791d] [c0107a68] [c02f73b5] [c014b95a] [c01389a2]
[c0108d5f]
skill D CA98B3DC 0  1897187(NOTLB)
Call Trace: [c010791d] [c0107a68] [c02f73dd] [c014b5cb] [c01311e6]
[c0108d5f]

c0107964 T __down_interruptible
c0107a28 T __down_trylock
c0107a60 T __down_failed
c0107a6c T __down_failed_interruptible

c02f6a00 T stext_lock
c02f827e A _etext
 ...


SysRq: Show Memory
Mem-info:
Free pages:2240kB ( 0kB HighMem)
( Active: 4153, inactive_dirty: 198, inactive_clean: 1077, free: 560 (383 766
1149) )
31*4kB 1*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB =
660kB)
125*4kB 5*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB
= 1580kB)
= 0kB)
Swap cache: add 3165, delete 547, find 25/124
Free swap:53104kB
49136 pages of RAM
0 pages of HIGHMEM
1798 reserved pages
2619 pages shared
2618 pages swap cached
0 pages in page table cache
Buffer memory: 1276kB

-d

--
  There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
  The good thing about standards is that there are so many to choose from. Andrew S. 
Tanenbaum



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread David Ford


On 2.4.0-ac12, I played music for about 30 minutes without any problems.  I started up 
an mpeg in xmms and it
locked in short order.  I'm sure now that it has something to do with the graphics.  
What DGA or other config
options do you have enabled for your game?

What video and sound card?

I have an ATI Rage LT Pro AGP-133 according to lspci.

-d

J Sloan wrote:

 Sorry, there was no xmms involved here -

 The behavior occurred while playing unreal tournament.

 But at least the sound card was in use, FWIW -

 jjs

 David Ford wrote:

  We've narrowed it down to "we're all running xmms" when it happend.
 
  -d
 
  J Sloan wrote:
 
   Just for the record, the system where I saw the problem
   has only ext2 -
 
  --
There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
The good thing about standards is that there are so many to choose from. Andrew 
S. Tanenbaum
 
  -
  To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
  the body of a message to [EMAIL PROTECTED]
  Please read the FAQ at http://www.tux.org/lkml/

--
  There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
  The good thing about standards is that there are so many to choose from. Andrew S. 
Tanenbaum



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread David Ford


It is important to note that when I hit the magic key and rebooted (SUB), a
split second before it rebooted, a stalled 'lspci' snapped back to life and
printed out my expected data.

-d

--
  There is a natural aristocracy among men. The grounds of this are virtue and 
talents. Thomas Jefferson
  The good thing about standards is that there are so many to choose from. Andrew S. 
Tanenbaum



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread J Sloan


OK, here's the details you asked about:

Soundblaster Awe 32 sound card
Voodoo 3 pci video card
Running Xfree86-4.0.0 (rpms from 3dfx.com)
Playing unreal tournament, no special game
options, just 800x600 graphics @ 16 bits.

To recap, the symptoms (hung ps, etc) occurred
on kernel 2.4.1-pre8 + low latency patches. (but
I don't think the low latency patches had anything
to do with it, based on the other reports)

Hope this helps

jjs

David Ford wrote:

 On 2.4.0-ac12, I played music for about 30 minutes without any problems.  I started 
up an mpeg in xmms and it
 locked in short order.  I'm sure now that it has something to do with the graphics.  
What DGA or other config
 options do you have enabled for your game?

 What video and sound card?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Shawn Starr


yes, I should also mention I have also a SoundBlaster 32AWE (0MB on the daughterboard).

J Sloan wrote:

 OK, here's the details you asked about:

 Soundblaster Awe 32 sound card
 Voodoo 3 pci video card
 Running Xfree86-4.0.0 (rpms from 3dfx.com)
 Playing unreal tournament, no special game
 options, just 800x600 graphics @ 16 bits.

 To recap, the symptoms (hung ps, etc) occurred
 on kernel 2.4.1-pre8 + low latency patches. (but
 I don't think the low latency patches had anything
 to do with it, based on the other reports)

 Hope this helps

 jjs

 David Ford wrote:

  On 2.4.0-ac12, I played music for about 30 minutes without any problems.  I 
started up an mpeg in xmms and it
  locked in short order.  I'm sure now that it has something to do with the 
graphics.  What DGA or other config
  options do you have enabled for your game?
 
  What video and sound card?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Linus Torvalds


In article [EMAIL PROTECTED], David Ford  [EMAIL PROTECTED] wrote:
Unfortunately klogd reads /procerg.

So the following is a painstakingly slow hand translation, I'll only print
the D state entries unless someone asks otherwise.

You seem to be pretty much able to reproduce this at will, right?

I'd really like to see the raw System.map and dmesg output if your
syslogd doesn't do a proper job of getting the symbols interpreted: just
send the things by email, and I'll put something together.  It's too
hard to interpret your half-way decoded thing, and I really want to see
what this xmms thing is doing.. 

xmms  D CACC5EA8  4116   713155   715  (NOTLB)1493   674
Call Trace: [c0124966] [c012412f] [c01242b8] [c0144138] [c014238e]
[c0131cd0] [c01236b2]
   [c01239f2] [c01ac5ca] [c010d1f6] [c0108e7c] [c0108d5f]

c01248e4 T ___wait_on_page
c0124984 t __lock_page

c01240dc t truncate_list_pages
c0124268 T truncate_inode_pages
c01242d4 t writeout_one_page

This is the smoking gun here, I bet, but I'd like to make sure I see the
whole thing. I don't see _why_ we'd have deadlocked on __wait_on_page(),
but I think this is the thread that hangs on to the mm semaphore.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-27 Thread Linus Torvalds




On Sun, 28 Jan 2001, Jens Axboe wrote:
  
  So what happens is that somebody takes a page fault (and gets the mm
  lock), tries to read something in, and never gets anything back, thus
  leaving the MM locked.
 
 What was the trace of this? Just curious, the below case outlined by
 Linus should be pretty generic, but I'd still like to know what
 can lead to this condition.

It was posted on linux-kernel - I don't save the dang things because I
have too much in my "archives" as is ;)

  Lorenzo, does the problem go away for you if you remove the
  
  if (!list_empty(q-request_freelist[rw])) {
  ...
  }
  
  code from blkdev_release_request() in drivers/block/ll_rw_block.c?
 
 Good spotting. Actually I see one more problem with it too. If
 we've started batching (under heavy I/O of course), we could
 splice the pending list and wake up X number of sleepers, but
 there's a) no guarentee that these sleepers will actually get
 the requests if new ones keep flooding in

(a) is ok. They'll go back to sleep - it's a loop waiting for requests..

and b) no guarentee
 that X sleepers require X request slots.

Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
them), they _will_ use a request. I don't think we have to worry about
that. At most we will wake up "too many" - we'll wake up processes even
though they end up not being able to get a request anyway because somebody
else got to it first. And that's ok. It's the "wake up too few" that
causes trouble, and I think that will be fixed by my suggestion.

Now, I'd worred if somebody wants several requests at the same time, and
doesn't feed them to the IO layer until it has gotten all of them. In that
case, you can get starvation with many people having "reserved" their
requests, and there not be enough free requests around to actually ever
wake anybody up again. But the regular IO paths do not do this: they will
all allocate a request and just submit it immediately, no "reservation".

(Obviously, _submitting_ the request doesn't mean that we'd actually start
processing it, but if somebody ends up waiting for requests they'll do the
unplug that does start it all going, so effectively we can think of it as
a logical "start this request now" thing even if it gets delayed in order
to coalesce IO).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-27 Thread Jens Axboe


On Sat, Jan 27 2001, Linus Torvalds wrote:
  What was the trace of this? Just curious, the below case outlined by
  Linus should be pretty generic, but I'd still like to know what
  can lead to this condition.
 
 It was posted on linux-kernel - I don't save the dang things because I
 have too much in my "archives" as is ;)

Ok I see it now, confused wrt the different threads...

  Good spotting. Actually I see one more problem with it too. If
  we've started batching (under heavy I/O of course), we could
  splice the pending list and wake up X number of sleepers, but
  there's a) no guarentee that these sleepers will actually get
  the requests if new ones keep flooding in
 
 (a) is ok. They'll go back to sleep - it's a loop waiting for requests..

My point is not that it's broken, but it will favor new comers
instead of tasks having blocked on a free slot already. So it
would still be nice to get right.

   and b) no guarentee
  that X sleepers require X request slots.
 
 Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
 them), they _will_ use a request. I don't think we have to worry about
 that. At most we will wake up "too many" - we'll wake up processes even
 though they end up not being able to get a request anyway because somebody
 else got to it first. And that's ok. It's the "wake up too few" that
 causes trouble, and I think that will be fixed by my suggestion.

Yes they may end up sleeing right away again as per the above a) case
for instance. The logic now is 'we have X free slots now, wake up
x sleepers' where it instead should be 'we have X free slots now,
wake up people until the free list is exhausted'.

 Now, I'd worred if somebody wants several requests at the same time, and
 doesn't feed them to the IO layer until it has gotten all of them. In that
 case, you can get starvation with many people having "reserved" their
 requests, and there not be enough free requests around to actually ever
 wake anybody up again. But the regular IO paths do not do this: they will
 all allocate a request and just submit it immediately, no "reservation".

Right, the I/O path doesn't do this and it would seem more appropriate
to have such users use their own requests instead of eating from
the internal pool.

-- 
* Jens Axboe [EMAIL PROTECTED]
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Håvard Kvålen


 Does anybody have a clue about what is different with xmms?
 
 Does it use KNI if it can, for example? We used to have a problem
 with KNI+Athlons, for example.

No, it doesn't.

 It might also be that it's threading-related, and that XMMS is one
 of the few things that uses threads. Things like that. I'm not an
 XMMS user, can somebody who knows XMMS comment on things that it
 does that are unusual?

Yes, threads could be the thing that makes a difference.  I can't
think of anything else that is special about XMMS. 

-- 
Hvard Kvlen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-27 Thread Marcelo Tosatti



On Sun, 28 Jan 2001, Jens Axboe wrote:

 On Sat, Jan 27 2001, Linus Torvalds wrote:
   What was the trace of this? Just curious, the below case outlined by
   Linus should be pretty generic, but I'd still like to know what
   can lead to this condition.
  
  It was posted on linux-kernel - I don't save the dang things because I
  have too much in my "archives" as is ;)
 
 Ok I see it now, confused wrt the different threads...
 
   Good spotting. Actually I see one more problem with it too. If
   we've started batching (under heavy I/O of course), we could
   splice the pending list and wake up X number of sleepers, but
   there's a) no guarentee that these sleepers will actually get
   the requests if new ones keep flooding in
  
  (a) is ok. They'll go back to sleep - it's a loop waiting for requests..
 
 My point is not that it's broken, but it will favor new comers
 instead of tasks having blocked on a free slot already. So it
 would still be nice to get right.
 
  and b) no guarentee
   that X sleepers require X request slots.
  
  Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
  them), they _will_ use a request. I don't think we have to worry about
  that. At most we will wake up "too many" - we'll wake up processes even
  though they end up not being able to get a request anyway because somebody
  else got to it first. And that's ok. It's the "wake up too few" that
  causes trouble, and I think that will be fixed by my suggestion.
 
 Yes they may end up sleeing right away again as per the above a) case
 for instance. The logic now is 'we have X free slots now, wake up
 x sleepers' where it instead should be 'we have X free slots now,
 wake up people until the free list is exhausted'.
 
  Now, I'd worred if somebody wants several requests at the same time, and
  doesn't feed them to the IO layer until it has gotten all of them. In that
  case, you can get starvation with many people having "reserved" their
  requests, and there not be enough free requests around to actually ever
  wake anybody up again. But the regular IO paths do not do this: they will
  all allocate a request and just submit it immediately, no "reservation".
 
 Right, the I/O path doesn't do this and it would seem more appropriate
 to have such users use their own requests instead of eating from
 the internal pool.
 
 -- 
 * Jens Axboe [EMAIL PROTECTED]
 * SuSE Labs
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Marcelo Tosatti



(ugh, sorry about last mail)

On 27 Jan 2001, Linus Torvalds wrote:

 In article [EMAIL PROTECTED], David Ford  [EMAIL PROTECTED] wrote:
 Unfortunately klogd reads /procerg.
 
 So the following is a painstakingly slow hand translation, I'll only print
 the D state entries unless someone asks otherwise.
 
 You seem to be pretty much able to reproduce this at will, right?
 
 I'd really like to see the raw System.map and dmesg output if your
 syslogd doesn't do a proper job of getting the symbols interpreted: just
 send the things by email, and I'll put something together.  It's too
 hard to interpret your half-way decoded thing, and I really want to see
 what this xmms thing is doing.. 
 
 xmms  D CACC5EA8  4116   713155   715  (NOTLB)1493   674
 Call Trace: [c0124966] [c012412f] [c01242b8] [c0144138] [c014238e]
 [c0131cd0] [c01236b2]
[c01239f2] [c01ac5ca] [c010d1f6] [c0108e7c] [c0108d5f]
 
 c01248e4 T ___wait_on_page
 c0124984 t __lock_page
 
 c01240dc t truncate_list_pages
 c0124268 T truncate_inode_pages
 c01242d4 t writeout_one_page
 
 This is the smoking gun here, I bet, but I'd like to make sure I see the
 whole thing. I don't see _why_ we'd have deadlocked on __wait_on_page(),
 but I think this is the thread that hangs on to the mm semaphore.

I was able to reproduce it here with dbench. 

Nothing is locked except this dbench thread (the only dbench thread):

dbenchD C1C9FE64  5200  1013  1(L-TLB)1370   785 
Call Trace: [___wait_on_page+130/160] [truncate_list_pages+100/404] 
[truncate_inode_pages+93/128] [iput+162/360] [dput+262/356] [fput+121/232] 
[exit_mmap+218/292]  
[mmput+56/80] [do_exit+208/680] [do_signal+566/656] [dput+25/356] [path_release+13/60] 
[sys_newstat+100/112] [sys_read+188/196] [signal_return+20/24]  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Linus Torvalds




On Sun, 28 Jan 2001, Marcelo Tosatti wrote:
  
  This is the smoking gun here, I bet, but I'd like to make sure I see the
  whole thing. I don't see _why_ we'd have deadlocked on __wait_on_page(),
  but I think this is the thread that hangs on to the mm semaphore.
 
 I was able to reproduce it here with dbench. 
 
 Nothing is locked except this dbench thread (the only dbench thread):
 
 dbenchD C1C9FE64  5200  1013  1(L-TLB)1370   785 
 Call Trace: [___wait_on_page+130/160] [truncate_list_pages+100/404] 
[truncate_inode_pages+93/128] [iput+162/360] [dput+262/356] [fput+121/232] 
[exit_mmap+218/292]  
 [mmput+56/80] [do_exit+208/680] [do_signal+566/656] [dput+25/356] 
[path_release+13/60] [sys_newstat+100/112] [sys_read+188/196] [signal_return+20/24]  

Ok, this definitely seems to be the pattern.

I don't see _what_ is going on, though.

I know of one "known bug" in pre10: if you run out of swap-space with
shared memory segments, it will do the wrong thing (return 1 without
unlocking the page). xmms might trigger this, but I didn't think that
dbench used shared memory?

There's also an ugliness in the truncate ordering. I don't think it should
matter, but I do believe it's conceptually wrong as-is.

Does this patch make any difference at all?

Linus

-
diff -u --recursive --new-file pre10/linux/mm/memory.c linux/mm/memory.c
--- pre10/linux/mm/memory.c Sat Jan 27 10:53:39 2001
+++ linux/mm/memory.c   Sat Jan 27 19:12:35 2001
@@ -945,7 +945,6 @@
if (inode-i_size  offset)
goto do_expand;
inode-i_size = offset;
-   truncate_inode_pages(mapping, offset);
spin_lock(mapping-i_shared_lock);
if (!mapping-i_mmap  !mapping-i_mmap_shared)
goto out_unlock;
@@ -960,8 +959,7 @@
 
 out_unlock:
spin_unlock(mapping-i_shared_lock);
-   /* this should go into -truncate */
-   inode-i_size = offset;
+   truncate_inode_pages(mapping, offset);
if (inode-i_op  inode-i_op-truncate)
inode-i_op-truncate(inode);
return;
diff -u --recursive --new-file pre10/linux/mm/shmem.c linux/mm/shmem.c
--- pre10/linux/mm/shmem.c  Sat Jan 27 10:53:39 2001
+++ linux/mm/shmem.cSat Jan 27 19:50:08 2001
@@ -217,8 +217,11 @@
 
info = page-mapping-host-u.shmem_i;
swap = __get_swap_page(2);
-   if (!swap.val)
-   return 1;
+   if (!swap.val) {
+   set_page_dirty(page);
+   UnlockPage(page);
+   return -ENOMEM;
+   }
 
spin_lock(info-lock);
shmem_recalc_inode(page-mapping-host);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Marcelo Tosatti




On Sat, 27 Jan 2001, Linus Torvalds wrote:

 
 
 On Sun, 28 Jan 2001, Marcelo Tosatti wrote:
   
   This is the smoking gun here, I bet, but I'd like to make sure I see the
   whole thing. I don't see _why_ we'd have deadlocked on __wait_on_page(),
   but I think this is the thread that hangs on to the mm semaphore.
  
  I was able to reproduce it here with dbench. 
  
  Nothing is locked except this dbench thread (the only dbench thread):
  
  dbenchD C1C9FE64  5200  1013  1(L-TLB)1370   785 
  Call Trace: [___wait_on_page+130/160] [truncate_list_pages+100/404] 
[truncate_inode_pages+93/128] [iput+162/360] [dput+262/356] [fput+121/232] 
[exit_mmap+218/292]  
  [mmput+56/80] [do_exit+208/680] [do_signal+566/656] [dput+25/356] 
[path_release+13/60] [sys_newstat+100/112] [sys_read+188/196] [signal_return+20/24]  
 
 Ok, this definitely seems to be the pattern.
 
 I don't see _what_ is going on, though.
 
 I know of one "known bug" in pre10: if you run out of swap-space with
 shared memory segments, it will do the wrong thing (return 1 without
 unlocking the page). xmms might trigger this, but I didn't think that
 dbench used shared memory?

It does. Bingo.

I'm not able to reproduce the problem here with your patch. 

Btw, there is another bug in shm_writepage() where it does not set the
page dirty in case of failure...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Marcelo Tosatti



On Sun, 28 Jan 2001, Marcelo Tosatti wrote:

 On Sat, 27 Jan 2001, Linus Torvalds wrote:
 
  
  
  On Sun, 28 Jan 2001, Marcelo Tosatti wrote:

This is the smoking gun here, I bet, but I'd like to make sure I see the
whole thing. I don't see _why_ we'd have deadlocked on __wait_on_page(),
but I think this is the thread that hangs on to the mm semaphore.
   
   I was able to reproduce it here with dbench. 
   
   Nothing is locked except this dbench thread (the only dbench thread):
   
   dbenchD C1C9FE64  5200  1013  1(L-TLB)1370   785 
   Call Trace: [___wait_on_page+130/160] [truncate_list_pages+100/404] 
[truncate_inode_pages+93/128] [iput+162/360] [dput+262/356] [fput+121/232] 
[exit_mmap+218/292]  
   [mmput+56/80] [do_exit+208/680] [do_signal+566/656] [dput+25/356] 
[path_release+13/60] [sys_newstat+100/112] [sys_read+188/196] [signal_return+20/24]  
  
  Ok, this definitely seems to be the pattern.
  
  I don't see _what_ is going on, though.
  
  I know of one "known bug" in pre10: if you run out of swap-space with
  shared memory segments, it will do the wrong thing (return 1 without
  unlocking the page). xmms might trigger this, but I didn't think that
  dbench used shared memory?
 
 It does. Bingo.
 
 I'm not able to reproduce the problem here with your patch. 
 
 Btw, there is another bug in shm_writepage() where it does not set the
 page dirty in case of failure...

Why dont you just put set_page_dirty() back in page_launder() in case
writepage() fails?

Otherwise you'll have to do in every specific implementation of
writepage(). 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-27 Thread Shawn Starr


Patch appears to work,
for i in [0-9]*; do echo $i; cat $i/stat  /dev/null; done
completes successfully with xmms running in "real-time" priority.

Shawn.

Marcelo Tosatti wrote:

 On Sat, 27 Jan 2001, Linus Torvalds wrote:

 
 
  On Sun, 28 Jan 2001, Marcelo Tosatti wrote:
   
This is the smoking gun here, I bet, but I'd like to make sure I see the
whole thing. I don't see _why_ we'd have deadlocked on __wait_on_page(),
but I think this is the thread that hangs on to the mm semaphore.
  
   I was able to reproduce it here with dbench.
  
   Nothing is locked except this dbench thread (the only dbench thread):
  
   dbenchD C1C9FE64  5200  1013  1(L-TLB)1370   785
   Call Trace: [___wait_on_page+130/160] [truncate_list_pages+100/404] 
[truncate_inode_pages+93/128] [iput+162/360] [dput+262/356] [fput+121/232] 
[exit_mmap+218/292]
   [mmput+56/80] [do_exit+208/680] [do_signal+566/656] [dput+25/356] 
[path_release+13/60] [sys_newstat+100/112] [sys_read+188/196] [signal_return+20/24]
 
  Ok, this definitely seems to be the pattern.
 
  I don't see _what_ is going on, though.
 
  I know of one "known bug" in pre10: if you run out of swap-space with
  shared memory segments, it will do the wrong thing (return 1 without
  unlocking the page). xmms might trigger this, but I didn't think that
  dbench used shared memory?

 It does. Bingo.

 I'm not able to reproduce the problem here with your patch.

 Btw, there is another bug in shm_writepage() where it does not set the
 page dirty in case of failure...

 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-26 Thread J Sloan


OK, It's official now, I didn't know if it was some
weird hardware fluke or something, but one of
the computers here exhibited the same problem -

The system in question is a Pentium II 400, scsi
only (aic7xxx), running 2.4.1-pre8 plus Andrew
Morton's low latency patches.

The user was playing unreal tournament at the time
and reported that it "got weird all of a sudden". I
logged in and tried to do a ps, but the ps froze
after listing a few lines. weird, never saw that one
before. The user rebooted, so there was further
opportunity to investigate, but I thought I ought
to mention it after seeing these reports!

jjs


Aaron Lehmann wrote:

> On Sat, Jan 27, 2001 at 03:34:26PM +1100, John Sheahan wrote:
> > Hi
> > my box has been running 2.4.1-pre10 for three days.
> > This morning I noticed odd behavioue - ps and top wouuld freeze
> > with no output.
>
> I had the same problem with 2.4.1-pre10 and the zerocopy patchset.
> I came home one day and xmms was frozen. Attempting to determine
> whether it was stuck in an odd state, I ran ps aux. At a certain
> point (presumably just when it started trying to print info about the
> xmms process), ps froze up too. And any attempts to killall -9 these
> processes made the killall freeze!
>
> I'm not sure what made xmms freeze up in the first place. My first
> though was a problem in the zerocopy patchset -- most of my mp3s are
> played over NFS. However, XMMS was completely idle during the time I
> was away from the computer, so I'm not sure what caused it. It seemed
> clear, however, that the problem was contagious between processes.
>
> I reverted back to 2.4.0-ac7 and have not had any more problems of this
> nature.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-26 Thread Shawn Starr


I noticed this problem in 2.4.1-pre8.

Odd, thats EXACLY what happened to me. I had to do a hard restart as killall
locked when i tried to kill ps.

Any word on why this is happening?


Aaron Lehmann wrote:

> On Sat, Jan 27, 2001 at 03:34:26PM +1100, John Sheahan wrote:
> > Hi
> > my box has been running 2.4.1-pre10 for three days.
> > This morning I noticed odd behavioue - ps and top wouuld freeze
> > with no output.
>
> I had the same problem with 2.4.1-pre10 and the zerocopy patchset.
> I came home one day and xmms was frozen. Attempting to determine
> whether it was stuck in an odd state, I ran ps aux. At a certain
> point (presumably just when it started trying to print info about the
> xmms process), ps froze up too. And any attempts to killall -9 these
> processes made the killall freeze!
>
> I'm not sure what made xmms freeze up in the first place. My first
> though was a problem in the zerocopy patchset -- most of my mp3s are
> played over NFS. However, XMMS was completely idle during the time I
> was away from the computer, so I'm not sure what caused it. It seemed
> clear, however, that the problem was contagious between processes.
>
> I reverted back to 2.4.0-ac7 and have not had any more problems of this
> nature.
>
>   
>Part 1.2Type: application/pgp-signature

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-26 Thread Aaron Lehmann

On Sat, Jan 27, 2001 at 03:34:26PM +1100, John Sheahan wrote:
> Hi
> my box has been running 2.4.1-pre10 for three days.
> This morning I noticed odd behavioue - ps and top wouuld freeze 
> with no output.

I had the same problem with 2.4.1-pre10 and the zerocopy patchset.
I came home one day and xmms was frozen. Attempting to determine
whether it was stuck in an odd state, I ran ps aux. At a certain
point (presumably just when it started trying to print info about the
xmms process), ps froze up too. And any attempts to killall -9 these
processes made the killall freeze!

I'm not sure what made xmms freeze up in the first place. My first
though was a problem in the zerocopy patchset -- most of my mp3s are
played over NFS. However, XMMS was completely idle during the time I
was away from the computer, so I'm not sure what caused it. It seemed
clear, however, that the problem was contagious between processes.

I reverted back to 2.4.0-ac7 and have not had any more problems of this
nature.

 PGP signature

ps hang in 241-pre10

2001-01-26 Thread John Sheahan


Hi
my box has been running 2.4.1-pre10 for three days.
This morning I noticed odd behavioue - ps and top wouuld freeze 
with no output.

running strace on 'ps'
open("/proc/669/environ", O_RDONLY) = 7
read(7, "INIT_VERSION=sysvinit-2.78\0previ"..., 2047) = 254
close(7)= 0
stat("/proc/683", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/683/stat", O_RDONLY)= 7
read(7, 
 --- and things just stop in that window--

I cannot read from /proc/683/
process 683 does not show up in /var/log/messages. How to I find 
what it is?
Any suggestions on how to debug?

Kernel 2.4.1-pre10 on a 2-processor i686
the box has run various 240-test for no unusual issues.
john
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

ps hang in 241-pre10

2001-01-26 Thread John Sheahan


Hi
my box has been running 2.4.1-pre10 for three days.
This morning I noticed odd behavioue - ps and top wouuld freeze 
with no output.

running strace on 'ps'
open("/proc/669/environ", O_RDONLY) = 7
read(7, "INIT_VERSION=sysvinit-2.78\0previ"..., 2047) = 254
close(7)= 0
stat("/proc/683", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/683/stat", O_RDONLY)= 7
read(7, 
 --- and things just stop in that window--

I cannot read from /proc/683/anything
process 683 does not show up in /var/log/messages. How to I find 
what it is?
Any suggestions on how to debug?

Kernel 2.4.1-pre10 on a 2-processor i686
the box has run various 240-test for no unusual issues.
john
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-26 Thread Aaron Lehmann


On Sat, Jan 27, 2001 at 03:34:26PM +1100, John Sheahan wrote:
 Hi
 my box has been running 2.4.1-pre10 for three days.
 This morning I noticed odd behavioue - ps and top wouuld freeze 
 with no output.

I had the same problem with 2.4.1-pre10 and the zerocopy patchset.
I came home one day and xmms was frozen. Attempting to determine
whether it was stuck in an odd state, I ran ps aux. At a certain
point (presumably just when it started trying to print info about the
xmms process), ps froze up too. And any attempts to killall -9 these
processes made the killall freeze!

I'm not sure what made xmms freeze up in the first place. My first
though was a problem in the zerocopy patchset -- most of my mp3s are
played over NFS. However, XMMS was completely idle during the time I
was away from the computer, so I'm not sure what caused it. It seemed
clear, however, that the problem was contagious between processes.

I reverted back to 2.4.0-ac7 and have not had any more problems of this
nature.

 PGP signature

Re: ps hang in 241-pre10

2001-01-26 Thread Shawn Starr


I noticed this problem in 2.4.1-pre8.

Odd, thats EXACLY what happened to me. I had to do a hard restart as killall
locked when i tried to kill ps.

Any word on why this is happening?


Aaron Lehmann wrote:

 On Sat, Jan 27, 2001 at 03:34:26PM +1100, John Sheahan wrote:
  Hi
  my box has been running 2.4.1-pre10 for three days.
  This morning I noticed odd behavioue - ps and top wouuld freeze
  with no output.

 I had the same problem with 2.4.1-pre10 and the zerocopy patchset.
 I came home one day and xmms was frozen. Attempting to determine
 whether it was stuck in an odd state, I ran ps aux. At a certain
 point (presumably just when it started trying to print info about the
 xmms process), ps froze up too. And any attempts to killall -9 these
 processes made the killall freeze!

 I'm not sure what made xmms freeze up in the first place. My first
 though was a problem in the zerocopy patchset -- most of my mp3s are
 played over NFS. However, XMMS was completely idle during the time I
 was away from the computer, so I'm not sure what caused it. It seemed
 clear, however, that the problem was contagious between processes.

 I reverted back to 2.4.0-ac7 and have not had any more problems of this
 nature.

   
Part 1.2Type: application/pgp-signature

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: ps hang in 241-pre10

2001-01-26 Thread J Sloan


OK, It's official now, I didn't know if it was some
weird hardware fluke or something, but one of
the computers here exhibited the same problem -

The system in question is a Pentium II 400, scsi
only (aic7xxx), running 2.4.1-pre8 plus Andrew
Morton's low latency patches.

The user was playing unreal tournament at the time
and reported that it "got weird all of a sudden". I
logged in and tried to do a ps, but the ps froze
after listing a few lines. weird, never saw that one
before. The user rebooted, so there was further
opportunity to investigate, but I thought I ought
to mention it after seeing these reports!

jjs


Aaron Lehmann wrote:

 On Sat, Jan 27, 2001 at 03:34:26PM +1100, John Sheahan wrote:
  Hi
  my box has been running 2.4.1-pre10 for three days.
  This morning I noticed odd behavioue - ps and top wouuld freeze
  with no output.

 I had the same problem with 2.4.1-pre10 and the zerocopy patchset.
 I came home one day and xmms was frozen. Attempting to determine
 whether it was stuck in an odd state, I ran ps aux. At a certain
 point (presumably just when it started trying to print info about the
 xmms process), ps froze up too. And any attempts to killall -9 these
 processes made the killall freeze!

 I'm not sure what made xmms freeze up in the first place. My first
 though was a problem in the zerocopy patchset -- most of my mp3s are
 played over NFS. However, XMMS was completely idle during the time I
 was away from the computer, so I'm not sure what caused it. It seemed
 clear, however, that the problem was contagious between processes.

 I reverted back to 2.4.0-ac7 and have not had any more problems of this
 nature.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

73 matches

Mail list logo