Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-28 Thread Jens Axboe

On Sun, Jan 28 2001, Linus Torvalds wrote:
> On Sun, 28 Jan 2001, Jens Axboe wrote:
> > 
> > How about this instead?
> 
> I really don't like this one. It will basically re-introduce the old
> behaviour of waking people up in a trickle, as far as I can tell. The
> reason we want the batching is to make people have more requests to sort
> in the elevator, and as far as I can tell this will just hurt that.
> 
> Are there any downsides to just _always_ batching, regardless of whether
> the request freelist is empty or not? Sure, it will make the "effective"
> size of the freelist a bit smaller, but that's probably not actually
> noticeable under any load except for the one that empties the freelist (in
> which case the old code would have triggered the batching anyway).

The problem with removing the !list_empty test like you suggested
is that batching is no longer controlled anymore. If we start
batching once the lists are empty and start wakeups once batch_requests
has been reached, we know we'll give the elevator enough to work
with to be effective. With !list_empty removed, batch_requests is no
longer a measure of how many requests we want to batch. Always
batching is not a in problem in itself, the effective smaller freelist
effect should be neglible.

The sent patch will only trickle wakeups in case of batching already
in effect, but batch_request wakeups were not enough to deplete
the freelist again. At least that was the intended effect :-)

> Performance numbers?

Don't have any right now, will test a bit later.

-- 
* Jens Axboe <[EMAIL PROTECTED]>
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-28 Thread Linus Torvalds



On Sun, 28 Jan 2001, Jens Axboe wrote:
> 
> How about this instead?

I really don't like this one. It will basically re-introduce the old
behaviour of waking people up in a trickle, as far as I can tell. The
reason we want the batching is to make people have more requests to sort
in the elevator, and as far as I can tell this will just hurt that.

Are there any downsides to just _always_ batching, regardless of whether
the request freelist is empty or not? Sure, it will make the "effective"
size of the freelist a bit smaller, but that's probably not actually
noticeable under any load except for the one that empties the freelist (in
which case the old code would have triggered the batching anyway).

Performance numbers?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-28 Thread Jens Axboe

On Sun, Jan 28 2001, Lorenzo Allegrucci wrote:
> >Ho humm. Jens: imagine that you have more people waiting for requests than
> >"batchcount". Further, imagine that you have multiple requests finishing
> >at the same time. Not unlikely. Now, imagine that one request finishes,
> >and causes "batchcount" users to wake up, and immediately another request
> >finishes but THAT one doesn't wake anybody up because it notices that the
> >freelist isn't empty - so it thinks that it doesn't need to wake anybody.
> >
> >Lorenzo, does the problem go away for you if you remove the
> >
> > if (!list_empty(>request_freelist[rw])) {
> > ...
> > }
> >
> >code from blkdev_release_request() in drivers/block/ll_rw_block.c?
> 
> Yes, it does.

How about this instead?

--- /opt/kernel/linux-2.4.1-pre10/drivers/block/ll_rw_blk.c Thu Jan 25 19:15:12 
2001
+++ drivers/block/ll_rw_blk.c   Sun Jan 28 19:22:20 2001
@@ -633,6 +634,8 @@
if (!list_empty(>request_freelist[rw])) {
blk_refill_freelist(q, rw);
list_add(>table, >request_freelist[rw]);
+   if (waitqueue_active(>wait_for_request))
+   wake_up_nr(>wait_for_request, 2);
return;
}
 

-- 
* Jens Axboe <[EMAIL PROTECTED]>
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-28 Thread Lorenzo Allegrucci

At 15.40 27/01/01 -0800, you wrote:
>
>
>On Sat, 27 Jan 2001, Lorenzo Allegrucci wrote:
>> 
>> A trivial "while(1) fork()" is enough to trigger it.
>> "mem=32M" by lilo, ulimit -u is 1024.
>
>Hmm.. This does not look like a VM deadlock - it looks like some IO
>request is waiting forever on "__get_request_wait()". In fact, it looks
>like a _lot_ of people are waiting for requests.
>
>So what happens is that somebody takes a page fault (and gets the mm
>lock), tries to read something in, and never gets anything back, thus
>leaving the MM locked.
>
>Jens: this looks suspiciously like somebody isn't waking things up when
>they add requests back to the request lists. Alternatively, maybe the
>unplugging isn't properly done, so that we have a lot of pending IO that
>doesn't get started..
>
>Ho humm. Jens: imagine that you have more people waiting for requests than
>"batchcount". Further, imagine that you have multiple requests finishing
>at the same time. Not unlikely. Now, imagine that one request finishes,
>and causes "batchcount" users to wake up, and immediately another request
>finishes but THAT one doesn't wake anybody up because it notices that the
>freelist isn't empty - so it thinks that it doesn't need to wake anybody.
>
>Lorenzo, does the problem go away for you if you remove the
>
>   if (!list_empty(>request_freelist[rw])) {
>   ...
>   }
>
>code from blkdev_release_request() in drivers/block/ll_rw_block.c?

Yes, it does.

--
Lorenzo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-28 Thread Lorenzo Allegrucci

At 15.40 27/01/01 -0800, you wrote:


On Sat, 27 Jan 2001, Lorenzo Allegrucci wrote:
 
 A trivial "while(1) fork()" is enough to trigger it.
 "mem=32M" by lilo, ulimit -u is 1024.

Hmm.. This does not look like a VM deadlock - it looks like some IO
request is waiting forever on "__get_request_wait()". In fact, it looks
like a _lot_ of people are waiting for requests.

So what happens is that somebody takes a page fault (and gets the mm
lock), tries to read something in, and never gets anything back, thus
leaving the MM locked.

Jens: this looks suspiciously like somebody isn't waking things up when
they add requests back to the request lists. Alternatively, maybe the
unplugging isn't properly done, so that we have a lot of pending IO that
doesn't get started..

Ho humm. Jens: imagine that you have more people waiting for requests than
"batchcount". Further, imagine that you have multiple requests finishing
at the same time. Not unlikely. Now, imagine that one request finishes,
and causes "batchcount" users to wake up, and immediately another request
finishes but THAT one doesn't wake anybody up because it notices that the
freelist isn't empty - so it thinks that it doesn't need to wake anybody.

Lorenzo, does the problem go away for you if you remove the

   if (!list_empty(q-request_freelist[rw])) {
   ...
   }

code from blkdev_release_request() in drivers/block/ll_rw_block.c?

Yes, it does.

--
Lorenzo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-28 Thread Linus Torvalds



On Sun, 28 Jan 2001, Jens Axboe wrote:
 
 How about this instead?

I really don't like this one. It will basically re-introduce the old
behaviour of waking people up in a trickle, as far as I can tell. The
reason we want the batching is to make people have more requests to sort
in the elevator, and as far as I can tell this will just hurt that.

Are there any downsides to just _always_ batching, regardless of whether
the request freelist is empty or not? Sure, it will make the "effective"
size of the freelist a bit smaller, but that's probably not actually
noticeable under any load except for the one that empties the freelist (in
which case the old code would have triggered the batching anyway).

Performance numbers?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-28 Thread Jens Axboe

On Sun, Jan 28 2001, Linus Torvalds wrote:
 On Sun, 28 Jan 2001, Jens Axboe wrote:
  
  How about this instead?
 
 I really don't like this one. It will basically re-introduce the old
 behaviour of waking people up in a trickle, as far as I can tell. The
 reason we want the batching is to make people have more requests to sort
 in the elevator, and as far as I can tell this will just hurt that.
 
 Are there any downsides to just _always_ batching, regardless of whether
 the request freelist is empty or not? Sure, it will make the "effective"
 size of the freelist a bit smaller, but that's probably not actually
 noticeable under any load except for the one that empties the freelist (in
 which case the old code would have triggered the batching anyway).

The problem with removing the !list_empty test like you suggested
is that batching is no longer controlled anymore. If we start
batching once the lists are empty and start wakeups once batch_requests
has been reached, we know we'll give the elevator enough to work
with to be effective. With !list_empty removed, batch_requests is no
longer a measure of how many requests we want to batch. Always
batching is not a in problem in itself, the effective smaller freelist
effect should be neglible.

The sent patch will only trickle wakeups in case of batching already
in effect, but batch_request wakeups were not enough to deplete
the freelist again. At least that was the intended effect :-)

 Performance numbers?

Don't have any right now, will test a bit later.

-- 
* Jens Axboe [EMAIL PROTECTED]
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-27 Thread Marcelo Tosatti


On Sun, 28 Jan 2001, Jens Axboe wrote:

> On Sat, Jan 27 2001, Linus Torvalds wrote:
> > > What was the trace of this? Just curious, the below case outlined by
> > > Linus should be pretty generic, but I'd still like to know what
> > > can lead to this condition.
> > 
> > It was posted on linux-kernel - I don't save the dang things because I
> > have too much in my "archives" as is ;)
> 
> Ok I see it now, confused wrt the different threads...
> 
> > > Good spotting. Actually I see one more problem with it too. If
> > > we've started batching (under heavy I/O of course), we could
> > > splice the pending list and wake up X number of sleepers, but
> > > there's a) no guarentee that these sleepers will actually get
> > > the requests if new ones keep flooding in
> > 
> > (a) is ok. They'll go back to sleep - it's a loop waiting for requests..
> 
> My point is not that it's broken, but it will favor new comers
> instead of tasks having blocked on a free slot already. So it
> would still be nice to get right.
> 
> > >and b) no guarentee
> > > that X sleepers require X request slots.
> > 
> > Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
> > them), they _will_ use a request. I don't think we have to worry about
> > that. At most we will wake up "too many" - we'll wake up processes even
> > though they end up not being able to get a request anyway because somebody
> > else got to it first. And that's ok. It's the "wake up too few" that
> > causes trouble, and I think that will be fixed by my suggestion.
> 
> Yes they may end up sleeing right away again as per the above a) case
> for instance. The logic now is 'we have X free slots now, wake up
> x sleepers' where it instead should be 'we have X free slots now,
> wake up people until the free list is exhausted'.
> 
> > Now, I'd worred if somebody wants several requests at the same time, and
> > doesn't feed them to the IO layer until it has gotten all of them. In that
> > case, you can get starvation with many people having "reserved" their
> > requests, and there not be enough free requests around to actually ever
> > wake anybody up again. But the regular IO paths do not do this: they will
> > all allocate a request and just submit it immediately, no "reservation".
> 
> Right, the I/O path doesn't do this and it would seem more appropriate
> to have such users use their own requests instead of eating from
> the internal pool.
> 
> -- 
> * Jens Axboe <[EMAIL PROTECTED]>
> * SuSE Labs
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-27 Thread Jens Axboe

On Sat, Jan 27 2001, Linus Torvalds wrote:
> > What was the trace of this? Just curious, the below case outlined by
> > Linus should be pretty generic, but I'd still like to know what
> > can lead to this condition.
> 
> It was posted on linux-kernel - I don't save the dang things because I
> have too much in my "archives" as is ;)

Ok I see it now, confused wrt the different threads...

> > Good spotting. Actually I see one more problem with it too. If
> > we've started batching (under heavy I/O of course), we could
> > splice the pending list and wake up X number of sleepers, but
> > there's a) no guarentee that these sleepers will actually get
> > the requests if new ones keep flooding in
> 
> (a) is ok. They'll go back to sleep - it's a loop waiting for requests..

My point is not that it's broken, but it will favor new comers
instead of tasks having blocked on a free slot already. So it
would still be nice to get right.

> >  and b) no guarentee
> > that X sleepers require X request slots.
> 
> Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
> them), they _will_ use a request. I don't think we have to worry about
> that. At most we will wake up "too many" - we'll wake up processes even
> though they end up not being able to get a request anyway because somebody
> else got to it first. And that's ok. It's the "wake up too few" that
> causes trouble, and I think that will be fixed by my suggestion.

Yes they may end up sleeing right away again as per the above a) case
for instance. The logic now is 'we have X free slots now, wake up
x sleepers' where it instead should be 'we have X free slots now,
wake up people until the free list is exhausted'.

> Now, I'd worred if somebody wants several requests at the same time, and
> doesn't feed them to the IO layer until it has gotten all of them. In that
> case, you can get starvation with many people having "reserved" their
> requests, and there not be enough free requests around to actually ever
> wake anybody up again. But the regular IO paths do not do this: they will
> all allocate a request and just submit it immediately, no "reservation".

Right, the I/O path doesn't do this and it would seem more appropriate
to have such users use their own requests instead of eating from
the internal pool.

-- 
* Jens Axboe <[EMAIL PROTECTED]>
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-27 Thread Linus Torvalds



On Sun, 28 Jan 2001, Jens Axboe wrote:
> > 
> > So what happens is that somebody takes a page fault (and gets the mm
> > lock), tries to read something in, and never gets anything back, thus
> > leaving the MM locked.
> 
> What was the trace of this? Just curious, the below case outlined by
> Linus should be pretty generic, but I'd still like to know what
> can lead to this condition.

It was posted on linux-kernel - I don't save the dang things because I
have too much in my "archives" as is ;)

> > Lorenzo, does the problem go away for you if you remove the
> > 
> > if (!list_empty(>request_freelist[rw])) {
> > ...
> > }
> > 
> > code from blkdev_release_request() in drivers/block/ll_rw_block.c?
> 
> Good spotting. Actually I see one more problem with it too. If
> we've started batching (under heavy I/O of course), we could
> splice the pending list and wake up X number of sleepers, but
> there's a) no guarentee that these sleepers will actually get
> the requests if new ones keep flooding in

(a) is ok. They'll go back to sleep - it's a loop waiting for requests..

>and b) no guarentee
> that X sleepers require X request slots.

Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
them), they _will_ use a request. I don't think we have to worry about
that. At most we will wake up "too many" - we'll wake up processes even
though they end up not being able to get a request anyway because somebody
else got to it first. And that's ok. It's the "wake up too few" that
causes trouble, and I think that will be fixed by my suggestion.

Now, I'd worred if somebody wants several requests at the same time, and
doesn't feed them to the IO layer until it has gotten all of them. In that
case, you can get starvation with many people having "reserved" their
requests, and there not be enough free requests around to actually ever
wake anybody up again. But the regular IO paths do not do this: they will
all allocate a request and just submit it immediately, no "reservation".

(Obviously, _submitting_ the request doesn't mean that we'd actually start
processing it, but if somebody ends up waiting for requests they'll do the
unplug that does start it all going, so effectively we can think of it as
a logical "start this request now" thing even if it gets delayed in order
to coalesce IO).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-27 Thread Linus Torvalds



On Sun, 28 Jan 2001, Jens Axboe wrote:
  
  So what happens is that somebody takes a page fault (and gets the mm
  lock), tries to read something in, and never gets anything back, thus
  leaving the MM locked.
 
 What was the trace of this? Just curious, the below case outlined by
 Linus should be pretty generic, but I'd still like to know what
 can lead to this condition.

It was posted on linux-kernel - I don't save the dang things because I
have too much in my "archives" as is ;)

  Lorenzo, does the problem go away for you if you remove the
  
  if (!list_empty(q-request_freelist[rw])) {
  ...
  }
  
  code from blkdev_release_request() in drivers/block/ll_rw_block.c?
 
 Good spotting. Actually I see one more problem with it too. If
 we've started batching (under heavy I/O of course), we could
 splice the pending list and wake up X number of sleepers, but
 there's a) no guarentee that these sleepers will actually get
 the requests if new ones keep flooding in

(a) is ok. They'll go back to sleep - it's a loop waiting for requests..

and b) no guarentee
 that X sleepers require X request slots.

Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
them), they _will_ use a request. I don't think we have to worry about
that. At most we will wake up "too many" - we'll wake up processes even
though they end up not being able to get a request anyway because somebody
else got to it first. And that's ok. It's the "wake up too few" that
causes trouble, and I think that will be fixed by my suggestion.

Now, I'd worred if somebody wants several requests at the same time, and
doesn't feed them to the IO layer until it has gotten all of them. In that
case, you can get starvation with many people having "reserved" their
requests, and there not be enough free requests around to actually ever
wake anybody up again. But the regular IO paths do not do this: they will
all allocate a request and just submit it immediately, no "reservation".

(Obviously, _submitting_ the request doesn't mean that we'd actually start
processing it, but if somebody ends up waiting for requests they'll do the
unplug that does start it all going, so effectively we can think of it as
a logical "start this request now" thing even if it gets delayed in order
to coalesce IO).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-27 Thread Jens Axboe

On Sat, Jan 27 2001, Linus Torvalds wrote:
  What was the trace of this? Just curious, the below case outlined by
  Linus should be pretty generic, but I'd still like to know what
  can lead to this condition.
 
 It was posted on linux-kernel - I don't save the dang things because I
 have too much in my "archives" as is ;)

Ok I see it now, confused wrt the different threads...

  Good spotting. Actually I see one more problem with it too. If
  we've started batching (under heavy I/O of course), we could
  splice the pending list and wake up X number of sleepers, but
  there's a) no guarentee that these sleepers will actually get
  the requests if new ones keep flooding in
 
 (a) is ok. They'll go back to sleep - it's a loop waiting for requests..

My point is not that it's broken, but it will favor new comers
instead of tasks having blocked on a free slot already. So it
would still be nice to get right.

   and b) no guarentee
  that X sleepers require X request slots.
 
 Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
 them), they _will_ use a request. I don't think we have to worry about
 that. At most we will wake up "too many" - we'll wake up processes even
 though they end up not being able to get a request anyway because somebody
 else got to it first. And that's ok. It's the "wake up too few" that
 causes trouble, and I think that will be fixed by my suggestion.

Yes they may end up sleeing right away again as per the above a) case
for instance. The logic now is 'we have X free slots now, wake up
x sleepers' where it instead should be 'we have X free slots now,
wake up people until the free list is exhausted'.

 Now, I'd worred if somebody wants several requests at the same time, and
 doesn't feed them to the IO layer until it has gotten all of them. In that
 case, you can get starvation with many people having "reserved" their
 requests, and there not be enough free requests around to actually ever
 wake anybody up again. But the regular IO paths do not do this: they will
 all allocate a request and just submit it immediately, no "reservation".

Right, the I/O path doesn't do this and it would seem more appropriate
to have such users use their own requests instead of eating from
the internal pool.

-- 
* Jens Axboe [EMAIL PROTECTED]
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)

2001-01-27 Thread Marcelo Tosatti


On Sun, 28 Jan 2001, Jens Axboe wrote:

 On Sat, Jan 27 2001, Linus Torvalds wrote:
   What was the trace of this? Just curious, the below case outlined by
   Linus should be pretty generic, but I'd still like to know what
   can lead to this condition.
  
  It was posted on linux-kernel - I don't save the dang things because I
  have too much in my "archives" as is ;)
 
 Ok I see it now, confused wrt the different threads...
 
   Good spotting. Actually I see one more problem with it too. If
   we've started batching (under heavy I/O of course), we could
   splice the pending list and wake up X number of sleepers, but
   there's a) no guarentee that these sleepers will actually get
   the requests if new ones keep flooding in
  
  (a) is ok. They'll go back to sleep - it's a loop waiting for requests..
 
 My point is not that it's broken, but it will favor new comers
 instead of tasks having blocked on a free slot already. So it
 would still be nice to get right.
 
  and b) no guarentee
   that X sleepers require X request slots.
  
  Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
  them), they _will_ use a request. I don't think we have to worry about
  that. At most we will wake up "too many" - we'll wake up processes even
  though they end up not being able to get a request anyway because somebody
  else got to it first. And that's ok. It's the "wake up too few" that
  causes trouble, and I think that will be fixed by my suggestion.
 
 Yes they may end up sleeing right away again as per the above a) case
 for instance. The logic now is 'we have X free slots now, wake up
 x sleepers' where it instead should be 'we have X free slots now,
 wake up people until the free list is exhausted'.
 
  Now, I'd worred if somebody wants several requests at the same time, and
  doesn't feed them to the IO layer until it has gotten all of them. In that
  case, you can get starvation with many people having "reserved" their
  requests, and there not be enough free requests around to actually ever
  wake anybody up again. But the regular IO paths do not do this: they will
  all allocate a request and just submit it immediately, no "reservation".
 
 Right, the I/O path doesn't do this and it would seem more appropriate
 to have such users use their own requests instead of eating from
 the internal pool.
 
 -- 
 * Jens Axboe [EMAIL PROTECTED]
 * SuSE Labs
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/