Re: [BUG] /proc/pid/stat access stalls badly for swapping process,2.4.0-test10

2000-11-10 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Mike Galbraith  [EMAIL PROTECTED] wrote:
 
 (This schenario, btw, is much harder to trigger on SMP than on UP. And
 it's completely separate from the issue of simple disk bandwidth issues
 which can obviously cause no end of stalls on anything that needs the
 disk, and which can also happen on SMP).

Unfortunately, it didn't help in the scenario I'm running.

time make -j30 bzImage:

real14m19.987s  (within stock variance)
user6m24.480s
sys 1m12.970s

Note that the above kin of "throughput performance" should not have been
affected, and was not what I was worried about. 

procs  memoryswap  io system cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  id
31  2  1 12   1432   4440  12660   0  1227   151  202   848  89  11   0
34  4  1   1908   2584536   5376 248 1904   602   763  785  4094  63  32  5
13 19  1  64140  67728604  33784 106500 84612 43625 21683 19080 52168  28  22  50

Looks like there was a big delay in vmstat there - that could easily be
due to simple disk throughput issues..

Does it feel any different under the original load that got the original
complaint? The patch may have just been buggy and ineffective, for all I
know. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] /proc/pid/stat access stalls badly for swapping process,2.4.0-test10

2000-11-10 Thread David Mansfield

Linus Torvalds wrote:
...
 
 And it has everything to do with the fact that the way Linux semaphores
 are implemented, a non-blocking process has a HUGE advantage over a
 blocking one. Linux kernel semaphores are extreme unfair in that way.

...
 The original running process comes back faulting again, finds the
 semaphore still unlocked (the "ps" process is awake but has not gotten to
 run yet), gets the semaphore, and falls asleep on the IO for the next
 page.
 
 The "ps" process actually gets to run now, but it's a bit late. The
 semaphore is locked again.
 
 Repeat until luck breaks the bad circle.
 

But doesn't __down have a fast path coded in assembly?  In other words,
it only hits your patched code if there is already contention, which
there isn't in this case, and therefore the bug...?

David Mansfield
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] /proc/pid/stat access stalls badly for swapping process,2.4.0-test10

2000-11-10 Thread Linus Torvalds

In article [EMAIL PROTECTED],
David Mansfield  [EMAIL PROTECTED] wrote:
Linus Torvalds wrote:
...
 
 And it has everything to do with the fact that the way Linux semaphores
 are implemented, a non-blocking process has a HUGE advantage over a
 blocking one. Linux kernel semaphores are extreme unfair in that way.

...
 The original running process comes back faulting again, finds the
 semaphore still unlocked (the "ps" process is awake but has not gotten to
 run yet), gets the semaphore, and falls asleep on the IO for the next
 page.
 
 The "ps" process actually gets to run now, but it's a bit late. The
 semaphore is locked again.
 
 Repeat until luck breaks the bad circle.
 

But doesn't __down have a fast path coded in assembly?  In other words,
it only hits your patched code if there is already contention, which
there isn't in this case, and therefore the bug...?

The __down() case should be hit if there's a waiter, even if that waiter
has not yet been able to pick up the lock (the waiter _will_ have
decremented the count to negative in order to trigger the proper logic
at release time).

But as I mentioned, the pseudo-patch was certainly untested, so
somebody should probably walk through the cases to check that I didn't
miss something.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] /proc/pid/stat access stalls badly for swapping process,2.4.0-test10

2000-11-09 Thread Linus Torvalds



As to the real reason for stalls on /proc/pid/stat, I bet it has nothing
to do with IO except indirectly (the IO is necessary to trigger the
problem, but the _reason_ for the problem lies elsewhere).

And it has everything to do with the fact that the way Linux semaphores
are implemented, a non-blocking process has a HUGE advantage over a
blocking one. Linux kernel semaphores are extreme unfair in that way.

What happens is that some process is getting a lot of VM faults and gets
its VM semaphore. No contention yet. it holds the semaphore over the
IO, and now another process does a "ps".

The "ps" process goes to sleep on the semaphore. So far so good.

The original process releases the semaphore, which increments the count,
and wakes up the process waiting for it. Note that it _wakes_ it, it does
not give the semaphore to it. Big difference.

The process that got woken up will run eventually. Probably not all that
immediately, because the process that woke it (and held the semaphore)
just slept on a page fault too, so it's not likely to immediately
relinquish the CPU.

The original running process comes back faulting again, finds the
semaphore still unlocked (the "ps" process is awake but has not gotten to
run yet), gets the semaphore, and falls asleep on the IO for the next
page.

The "ps" process actually gets to run now, but it's a bit late. The
semaphore is locked again. 

Repeat until luck breaks the bad circle.

(This schenario, btw, is much harder to trigger on SMP than on UP. And
it's completely separate from the issue of simple disk bandwidth issues
which can obviously cause no end of stalls on anything that needs the
disk, and which can also happen on SMP).

NOTE! If somebody wants to fix this, the fix should be reasonably simple
but needs to be quite exhaustively checked and double-checked. It's just
too easy to break the semaphores by mistake.

The way to make semaphores more fair is to NOT allow a new process to just
come in immediately and steal the semaphore in __down() if there are other
sleepers. This is most easily accomplished by something along the lines of
the following in __down() in arch/i386/kernel/semaphore.c 

spin_lock_irq(semaphore_lock);
sem-sleepers++;
+
+   /*
+* Are there other people waiting for this?
+* They get to go first.
+*/
+   if (sleepers  1)
+   goto inside;
for (;;) {
int sleepers = sem-sleepers;

/*
 * Add "everybody else" into it. They aren't
 * playing, because we own the spinlock.
 */
if (!atomic_add_negative(sleepers - 1, sem-count)) {
sem-sleepers = 0;
break;
}
sem-sleepers = 1;  /* us - see -1 above */
+inside:
spin_unlock_irq(semaphore_lock);
schedule();
tsk-state = TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE;
spin_lock_irq(semaphore_lock);
}
spin_unlock_irq(semaphore_lock);

But note that teh above is UNTESTED and also note that from a throughput
(as opposed to latency) standpoint being unfair tends to be nice.

Anybody want to try out something like the above? (And no, I'm not
applying it to my tree yet. It needs about a hundred pairs of eyes to
verify that there isn't some subtle "lost wakeup" race somewhere).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] /proc/pid/stat access stalls badly for swapping process,2.4.0-test10

2000-11-09 Thread Mike Galbraith

On Thu, 9 Nov 2000, Linus Torvalds wrote:

 
 
 As to the real reason for stalls on /proc/pid/stat, I bet it has nothing
 to do with IO except indirectly (the IO is necessary to trigger the
 problem, but the _reason_ for the problem lies elsewhere).
 
 And it has everything to do with the fact that the way Linux semaphores
 are implemented, a non-blocking process has a HUGE advantage over a
 blocking one. Linux kernel semaphores are extreme unfair in that way.
 
 What happens is that some process is getting a lot of VM faults and gets
 its VM semaphore. No contention yet. it holds the semaphore over the
 IO, and now another process does a "ps".
 
 The "ps" process goes to sleep on the semaphore. So far so good.
 
 The original process releases the semaphore, which increments the count,
 and wakes up the process waiting for it. Note that it _wakes_ it, it does
 not give the semaphore to it. Big difference.
 
 The process that got woken up will run eventually. Probably not all that
 immediately, because the process that woke it (and held the semaphore)
 just slept on a page fault too, so it's not likely to immediately
 relinquish the CPU.
 
 The original running process comes back faulting again, finds the
 semaphore still unlocked (the "ps" process is awake but has not gotten to
 run yet), gets the semaphore, and falls asleep on the IO for the next
 page.
 
 The "ps" process actually gets to run now, but it's a bit late. The
 semaphore is locked again. 
 
 Repeat until luck breaks the bad circle.
 
 (This schenario, btw, is much harder to trigger on SMP than on UP. And
 it's completely separate from the issue of simple disk bandwidth issues
 which can obviously cause no end of stalls on anything that needs the
 disk, and which can also happen on SMP).

Unfortunately, it didn't help in the scenario I'm running.

time make -j30 bzImage:

real14m19.987s  (within stock variance)
user6m24.480s
sys 1m12.970s

procs  memoryswap  io system cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  id
31  2  1 12   1432   4440  12660   0  1227   151  202   848  89  11   0
34  4  1   1908   2584536   5376 248 1904   602   763  785  4094  63  32  5
13 19  1  64140  67728604  33784 106500 84612 43625 21683 19080 52168  28  22  50

I understood the above well enough to be very interested in seeing what
happens with flush IO restricted.

-Mike

[try_to_free_pages()-swap_out()/shm_swap().. can fight over who gets
to shrink the best candidate's footprint?]

Thanks!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] /proc/pid/stat access stalls badly for swapping process, 2.4.0-test10

2000-11-03 Thread Jens Axboe

On Fri, Nov 03 2000, Mike Galbraith wrote:
  I very much agree.  Kflushd is still hungry for free write
  bandwidth here.
 
 In the LKML tradition of code talks and silly opinions walk...
 
 Attached is a diagnostic patch which gets kflushd under control,
 and takes make -j30 bzImage build times down from 12 minutes to
 9 here.  I have no more massive context switching on write, and
 copies seem to go a lot quicker to boot.  (that may be because
 some of my failures were really _really_ horrible)
 
 Comments are very welcome.  I haven't had problems with this yet,
 but it's early so...  This patch isn't supposed to be pretty either
 (hw techs don't do pretty;) it's only supposed to say 'Huston...'
 so be sure to grab a barfbag before you take a look. 

Super, looks pretty good from here. I'll give it a go when I get back.
In addition, here's a small patch that disables the read stealing
of requests from the write list -- does that improve behaviour
when we are busy flushing?

-- 
* Jens Axboe [EMAIL PROTECTED]
* SuSE Labs


--- drivers/block/ll_rw_blk.c~  Fri Nov  3 03:22:25 2000
+++ drivers/block/ll_rw_blk.c   Fri Nov  3 03:23:24 2000
@@ -455,35 +455,17 @@
struct list_head *list = q-request_freelist[rw];
struct request *rq;
 
-   /*
-* Reads get preferential treatment and are allowed to steal
-* from the write free list if necessary.
-*/
if (!list_empty(list)) {
rq = blkdev_free_rq(list);
-   goto got_rq;
-   }
-
-   /*
-* if the WRITE list is non-empty, we know that rw is READ
-* and that the READ list is empty. allow reads to 'steal'
-* from the WRITE list.
-*/
-   if (!list_empty(q-request_freelist[WRITE])) {
-   list = q-request_freelist[WRITE];
-   rq = blkdev_free_rq(list);
-   goto got_rq;
+   list_del(rq-table);
+   rq-free_list = list;
+   rq-rq_status = RQ_ACTIVE;
+   rq-special = NULL;
+   rq-q = q;
+   return rq;
}
 
return NULL;
-
-got_rq:
-   list_del(rq-table);
-   rq-free_list = list;
-   rq-rq_status = RQ_ACTIVE;
-   rq-special = NULL;
-   rq-q = q;
-   return rq;
 }
 
 /*



Re: [BUG] /proc/pid/stat access stalls badly for swapping process,2.4.0-test10

2000-11-03 Thread Mike Galbraith

On Fri, 3 Nov 2000, Jens Axboe wrote:

 On Fri, Nov 03 2000, Mike Galbraith wrote:
   I very much agree.  Kflushd is still hungry for free write
   bandwidth here.
  
  In the LKML tradition of code talks and silly opinions walk...
  
  Attached is a diagnostic patch which gets kflushd under control,
  and takes make -j30 bzImage build times down from 12 minutes to
  9 here.  I have no more massive context switching on write, and
  copies seem to go a lot quicker to boot.  (that may be because
  some of my failures were really _really_ horrible)
  
  Comments are very welcome.  I haven't had problems with this yet,
  but it's early so...  This patch isn't supposed to be pretty either
  (hw techs don't do pretty;) it's only supposed to say 'Huston...'
  so be sure to grab a barfbag before you take a look. 
 
 Super, looks pretty good from here. I'll give it a go when I get back.
 In addition, here's a small patch that disables the read stealing
 of requests from the write list -- does that improve behaviour
 when we are busy flushing?

Yes.  I've done this a bit differently here, and have had good
results.  I only disable stealing when I need flush throughput.

Now that the box isn't biting off more than it can chew quite
as often, I'll try this again.  I'm pretty darn sure that I can
get more throughput, but : I've learned that getting too much
can do really OOGLY things. (turns box into single user single
tasking streaming IO monster from hell)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] /proc/pid/stat access stalls badly for swapping process, 2.4.0-test10

2000-11-02 Thread Christoph Rohland

Hi Rik,

I can probably give some more datapoints. Here is the console output
of my test machine (there is a 'vmstat 5' running in background):

[root@ls3016 /root]# killall shmtst
[root@ls3016 /root]#
 1 12  2  0 1607668  18932 2110496   0   0 67154 1115842 1050063 2029389   0   2  
98
 0 10  2  0 1607564  18932 2110496   0   0 0   300  317   426   0   0 100
 0 10  2  0 1607408  18932 2110496   0   0 0   301  336   473   0   0 100
 0 10  2  0 1607560  18932 2110508   0   0 0   307  318   430   0   0 100
 0 10  2  0 1607556  18932 2110512   0   0 0   304  324   433   0   0 100
 0 10  2  0 1607528  18932 2110512   0   0 0   272  308   410   0   1  99
 0 10  2  0 1607440  18932 2110516   0   0 0   315  323   438   0   1  99
 0 10  2  0 1607528  18932 2110516   0   0 0   323  316   424   0   0 100
 0 10  2  0 1607556  18932 2110516   0   0 0   304  309   410   0   0 100
 0 10  2  0 1607600  18932 2110528   0   0 0   298  314   418   0   0 100
 0 10  2  0 1607384  18932 2110528   0   0 0   296  307   406   0   1  99
 0 10  2  0 1607284  18932 2110528   0   0 0   304  315   421   0   0 100
 0 10  2  0 1607668  18932 2110528   0   0 0   298  304   402   0   0 100
 0 10  2  0 1607576  18932 2110528   0   0 0   285  307   405   0   0 100
 0 10  2  0 1607656  18932 2110528   0   0 0   292  303   399   0   1  99
 0 10  2  0 1607928  18932 2110528   0   0 0   313  310   408   0   0 100
   procs  memoryswap  io system cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  id
 0 10  2  0 1608440  18932 2110528   0   0 0   340  313   417   0   1  99
 0 10  2  0 1608260  18932 2110528   0   0 0   298  318   426   0   0 100
 0 10  2  0 1608208  18932 2110528   0   0 0   314  334   448   0   1  99
 0 10  2  0 1608396  18932 2110528   0   0 0   323  316   421   0   1  99
 0 10  2  0 1608204  18932 2110548   0   0 0   334  333   458   0   0 100
 0 10  2  0 1607888  18932 2110580   0   0 0   336  329   448   0   1  99
 0 10  2  0 1608040  18932 2110584   0   0 0   317  321   435   0   0 100
 0 10  2  0 1608032  18932 2110588   0   0 0   241  318   425   0   0 100
 0 10  2  0 1608028  18932 2110592   0   0 0   257  325   443   0   1  99
 0 10  3  0 1608028  18932 2110592   0   0 0   258  323   435   0   0  99
 0 10  2  0 1608032  18932 2110592   0   0 0   241  316   425   0   0 100
 0 10  2  0 1608024  18932 2110592   0   0 0   261  337   460   0   0 100
 0 10  2  0 1608016  18932 2110592   0   0 0   253  328   444   0   0 100
 0 10  2  0 1608024  18932 2110592   0   0 0   252  320   435   0   0 100
 0 10  2  0 1608012  18932 2110592   0   0 0   255  326   446   0   0 100
 0 10  2  0 1608020  18932 2110592   0   0 0   255  326   444   0   1  99
 0 10  2  0 1608012  18932 2110600   0   0 0   261  341   469   0   0 100
 0 10  2  0 1607992  18932 2110608   0   0 0   261  344   479   0   0 100
 0 10  2  0 1607992  18932 2110612   0   0 0   264  342   471   0   0 100
 0 10  2  0 1607984  18932 2110612   0   0 0   266  334   462   0   0 100
 0 10  2  0 1607980  18932 2110620   0   0 0   273  340   468   0   0  99
   procs  memoryswap  io system cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  id
 0 10  2  0 1607972  18932 2110624   0   0 0   266  345   474   0   1  99
 0 10  2  0 1607940  18932 2110640   0   0 0   256  341   462   0   0 100
 0 10  2  0 1607936  18932 2110644   0   0 0   262  339   462   0   1  99
 0 10  2  0 1607940  18932 2110644   0   0 0   261  333   450   0   1  99
0 10  2  0 1607944  18932 2110644   0   0 0   253  335   454   0   0 100
0 10  2  0 1607944  18932 2110644   0   0 0   272  352   479   0   1  99

[root@ls3016 /root]# ps l
  F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTYTIME COMMAND
100 0   820 1   9   0  2200 1168 wait4  SttyS0  0:00 login -- ro
100 0   862   820  14   0  1756  976 wait4  SttyS0  0:00 -bash
000 0   878   862   9   0  1080  360 down   DttyS0 11:27 ./shmtst 10
000 0   879   862   9   0  1080  360 down   DttyS0 15:21 ./shmtst 15
040 0   880   878   9   0  1092  416 wait_o DttyS0  8:55 ./shmtst 10
040 0   881   878   9   0  1080  360 down   DttyS0 10:22 ./shmtst 10
444 0   882   878   9   0 00 do_exi ZttyS0 10:00 [shmtst de
040 0   883   878   9   0  1092  416 wait_o DttyS0  9:30 ./shmtst 10
040 0   884   878   9   0  1092  416 down   DttyS0  8:44 ./shmtst 10
040 0   885   878   9   0  1092  416 down   DttyS0  9:01 ./shmtst 10
444 0   886   878   9   0 0  

Re: [BUG] /proc/pid/stat access stalls badly for swapping process, 2.4.0-test10

2000-11-02 Thread Jens Axboe

On Thu, Nov 02 2000, Val Henson wrote:
   3) combine this with the elevator starvation stuff (ask Jens
  Axboe for blk-7 to alleviate this issue) and you have a
  scenario where processes using /proc/pid/stat have the
  possibility to block on multiple processes that are in the
  process of handling a page fault (but are being starved)
  
  I'm experimenting with blk.[67] in test10 right now.  The stalls
  are not helped at all.  It doesn't seem to become request bound
  (haven't instrumented that yet to be sure) but the stalls persist.
  
  -Mike
 
 This is not an elevator starvation problem.

True, but the blk-xx patches help work-around (what I believe) is
bad flushing behaviour by the vm.

 I also experienced these stalls with my IDE-only system.  Unless I'm
 badly mistaken, the elevator is only used on SCSI disks, therefore
 elevator starvation cannot be blamed for this problem.  These stalls
 are particularly annoying since I want to find the pid of the process
 hogging memory in order to kill it, but the read from /proc stalls for
 45 seconds or more.

You are badly mistaken.

-- 
* Jens Axboe [EMAIL PROTECTED]
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] /proc/pid/stat access stalls badly for swapping process,2.4.0-test10

2000-11-02 Thread Mike Galbraith

On Thu, 2 Nov 2000, Jens Axboe wrote:

 On Thu, Nov 02 2000, Val Henson wrote:
3) combine this with the elevator starvation stuff (ask Jens
   Axboe for blk-7 to alleviate this issue) and you have a
   scenario where processes using /proc/pid/stat have the
   possibility to block on multiple processes that are in the
   process of handling a page fault (but are being starved)
   
   I'm experimenting with blk.[67] in test10 right now.  The stalls
   are not helped at all.  It doesn't seem to become request bound
   (haven't instrumented that yet to be sure) but the stalls persist.
   
 -Mike
  
  This is not an elevator starvation problem.
 
 True, but the blk-xx patches help work-around (what I believe) is
 bad flushing behaviour by the vm.

I very much agree.  Kflushd is still hungry for free write
bandwidth here.

Of course it's _going_ to have to wait if you're doing max IO
throughput, but when you're flushing, you need to let kflushd
have the bandwidth it needs to do it's job.  I don't think it's
getting what it needs, and am trying two things.

1.  Revoke read's ability to steal requests while we're in a
heavy flushing situation.  Flushing must proceed, and it must
go at full speed.  (Actually, reversing the favoritism when
you need flush bandwidth makes sense to me, and does help if
limited.. if not limited, it hurts like hell)

2.  Use the information that we are starving (or going full bore)
to tell the VM to keep it's fingers off dirty buffers.  If we're
flushing at disk speed, page_launder() can't do anything useful
with dirty buffers, it can only do harm IMHO.

-Mike

P.S.  Before I revert to Luke Codecrawler mode, I have a wild
problem theory I'd appreciate comments on.. preferably the kind
where I become extremely busy thinking about their content ;-)

If one __alloc_pages() is waiting for kswapd, kswapd tries to do
synchronous flushing.. if the write queue is nearly (or) exausted
and page_launder() couldn't clean any buffers on it's first pass,
it blasts the queue some more and stalls.  If kswapd, kflushd and
kupdate are all waiting for a request, and then say a GFP_BUFFER
allocation comes along.. (we're low on memory) we do SCHED_YIELD
schedule().  If we're holding IO locks, nobody can do IO.  OK, if
there's nobody else running, we come right back and either finish
the allocation of fail.  But, if you have other allocations trying
to flush buffers (GFP_KERNEL eg), they are not only in danger of
stacking up due to a request shortage, but they can't get whatever
IO locks the GFP_BUFFER allocation is holding anyway so are doomed
until we do schedule back to the GFP_BUFFER allocating task.

Isn't scheduling while holding IO locks the wrong thing to do?  It's
protected from neither GFP_BUFFER nor PF_MEMALLOC.

I must be missing something.. but what?

ears at maximum gain

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] /proc/pid/stat access stalls badly for swapping process, 2.4.0-test10

2000-11-01 Thread Rik van Riel

On Wed, 1 Nov 2000, David Mansfield wrote:

 I'd like to report what seems like a performance problem in the latest
 kernels.  Actually, all recent kernels have exhibited this problem, but
 I was waiting for the new VM stuff to stabilize before reporting it. 
 
 My test is: run 7 processes that each allocate and randomly
 access 32mb of ram (on a 256mb machine).  Even though 7*32MB =
 224MB, this still sends the machine lightly into swap.  The
 machine continues to function fairly smoothly for the most part.  
 I can do filesystem operations, run new programs, move desktops
 in X etc.
 
 Except: programs which access /proc/pid/stat stall for an
 inderminate amount of time.  For example, 'ps' and 'vmstat'
 stall BADLY in these scenarios.  I have had the stalls last over
 a minute in higher VM pressure situations.

I have one possible reason for this 

1) the procfs process does (in fs/proc/array.c::proc_pid_stat)
down(mm-mmap_sem);

2) but, in order to do that, it has to wait until the process
   it is trying to stat has /finished/ its page fault, and is
   not into its next one ...

3) combine this with the elevator starvation stuff (ask Jens
   Axboe for blk-7 to alleviate this issue) and you have a
   scenario where processes using /proc/pid/stat have the
   possibility to block on multiple processes that are in the
   process of handling a page fault (but are being starved)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] /proc/pid/stat access stalls badly for swapping process, 2.4.0-test10

2000-11-01 Thread Mike Galbraith

On Wed, 1 Nov 2000, Rik van Riel wrote:

 On Wed, 1 Nov 2000, David Mansfield wrote:
 
  I'd like to report what seems like a performance problem in the latest
  kernels.  Actually, all recent kernels have exhibited this problem, but
  I was waiting for the new VM stuff to stabilize before reporting it. 
  
  My test is: run 7 processes that each allocate and randomly
  access 32mb of ram (on a 256mb machine).  Even though 7*32MB =
  224MB, this still sends the machine lightly into swap.  The
  machine continues to function fairly smoothly for the most part.  
  I can do filesystem operations, run new programs, move desktops
  in X etc.
  
  Except: programs which access /proc/pid/stat stall for an
  inderminate amount of time.  For example, 'ps' and 'vmstat'
  stall BADLY in these scenarios.  I have had the stalls last over
  a minute in higher VM pressure situations.
 
 I have one possible reason for this 
 
 1) the procfs process does (in fs/proc/array.c::proc_pid_stat)
   down(mm-mmap_sem);
 
 2) but, in order to do that, it has to wait until the process
it is trying to stat has /finished/ its page fault, and is
not into its next one ...
 
 3) combine this with the elevator starvation stuff (ask Jens
Axboe for blk-7 to alleviate this issue) and you have a
scenario where processes using /proc/pid/stat have the
possibility to block on multiple processes that are in the
process of handling a page fault (but are being starved)

I'm experimenting with blk.[67] in test10 right now.  The stalls
are not helped at all.  It doesn't seem to become request bound
(haven't instrumented that yet to be sure) but the stalls persist.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/