Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
On Sun, Jan 28 2001, Linus Torvalds wrote: > On Sun, 28 Jan 2001, Jens Axboe wrote: > > > > How about this instead? > > I really don't like this one. It will basically re-introduce the old > behaviour of waking people up in a trickle, as far as I can tell. The > reason we want the batching is to make people have more requests to sort > in the elevator, and as far as I can tell this will just hurt that. > > Are there any downsides to just _always_ batching, regardless of whether > the request freelist is empty or not? Sure, it will make the "effective" > size of the freelist a bit smaller, but that's probably not actually > noticeable under any load except for the one that empties the freelist (in > which case the old code would have triggered the batching anyway). The problem with removing the !list_empty test like you suggested is that batching is no longer controlled anymore. If we start batching once the lists are empty and start wakeups once batch_requests has been reached, we know we'll give the elevator enough to work with to be effective. With !list_empty removed, batch_requests is no longer a measure of how many requests we want to batch. Always batching is not a in problem in itself, the effective smaller freelist effect should be neglible. The sent patch will only trickle wakeups in case of batching already in effect, but batch_request wakeups were not enough to deplete the freelist again. At least that was the intended effect :-) > Performance numbers? Don't have any right now, will test a bit later. -- * Jens Axboe <[EMAIL PROTECTED]> * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
On Sun, 28 Jan 2001, Jens Axboe wrote: > > How about this instead? I really don't like this one. It will basically re-introduce the old behaviour of waking people up in a trickle, as far as I can tell. The reason we want the batching is to make people have more requests to sort in the elevator, and as far as I can tell this will just hurt that. Are there any downsides to just _always_ batching, regardless of whether the request freelist is empty or not? Sure, it will make the "effective" size of the freelist a bit smaller, but that's probably not actually noticeable under any load except for the one that empties the freelist (in which case the old code would have triggered the batching anyway). Performance numbers? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
On Sun, Jan 28 2001, Lorenzo Allegrucci wrote: > >Ho humm. Jens: imagine that you have more people waiting for requests than > >"batchcount". Further, imagine that you have multiple requests finishing > >at the same time. Not unlikely. Now, imagine that one request finishes, > >and causes "batchcount" users to wake up, and immediately another request > >finishes but THAT one doesn't wake anybody up because it notices that the > >freelist isn't empty - so it thinks that it doesn't need to wake anybody. > > > >Lorenzo, does the problem go away for you if you remove the > > > > if (!list_empty(>request_freelist[rw])) { > > ... > > } > > > >code from blkdev_release_request() in drivers/block/ll_rw_block.c? > > Yes, it does. How about this instead? --- /opt/kernel/linux-2.4.1-pre10/drivers/block/ll_rw_blk.c Thu Jan 25 19:15:12 2001 +++ drivers/block/ll_rw_blk.c Sun Jan 28 19:22:20 2001 @@ -633,6 +634,8 @@ if (!list_empty(>request_freelist[rw])) { blk_refill_freelist(q, rw); list_add(>table, >request_freelist[rw]); + if (waitqueue_active(>wait_for_request)) + wake_up_nr(>wait_for_request, 2); return; } -- * Jens Axboe <[EMAIL PROTECTED]> * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
At 15.40 27/01/01 -0800, you wrote: > > >On Sat, 27 Jan 2001, Lorenzo Allegrucci wrote: >> >> A trivial "while(1) fork()" is enough to trigger it. >> "mem=32M" by lilo, ulimit -u is 1024. > >Hmm.. This does not look like a VM deadlock - it looks like some IO >request is waiting forever on "__get_request_wait()". In fact, it looks >like a _lot_ of people are waiting for requests. > >So what happens is that somebody takes a page fault (and gets the mm >lock), tries to read something in, and never gets anything back, thus >leaving the MM locked. > >Jens: this looks suspiciously like somebody isn't waking things up when >they add requests back to the request lists. Alternatively, maybe the >unplugging isn't properly done, so that we have a lot of pending IO that >doesn't get started.. > >Ho humm. Jens: imagine that you have more people waiting for requests than >"batchcount". Further, imagine that you have multiple requests finishing >at the same time. Not unlikely. Now, imagine that one request finishes, >and causes "batchcount" users to wake up, and immediately another request >finishes but THAT one doesn't wake anybody up because it notices that the >freelist isn't empty - so it thinks that it doesn't need to wake anybody. > >Lorenzo, does the problem go away for you if you remove the > > if (!list_empty(>request_freelist[rw])) { > ... > } > >code from blkdev_release_request() in drivers/block/ll_rw_block.c? Yes, it does. -- Lorenzo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
At 15.40 27/01/01 -0800, you wrote: On Sat, 27 Jan 2001, Lorenzo Allegrucci wrote: A trivial "while(1) fork()" is enough to trigger it. "mem=32M" by lilo, ulimit -u is 1024. Hmm.. This does not look like a VM deadlock - it looks like some IO request is waiting forever on "__get_request_wait()". In fact, it looks like a _lot_ of people are waiting for requests. So what happens is that somebody takes a page fault (and gets the mm lock), tries to read something in, and never gets anything back, thus leaving the MM locked. Jens: this looks suspiciously like somebody isn't waking things up when they add requests back to the request lists. Alternatively, maybe the unplugging isn't properly done, so that we have a lot of pending IO that doesn't get started.. Ho humm. Jens: imagine that you have more people waiting for requests than "batchcount". Further, imagine that you have multiple requests finishing at the same time. Not unlikely. Now, imagine that one request finishes, and causes "batchcount" users to wake up, and immediately another request finishes but THAT one doesn't wake anybody up because it notices that the freelist isn't empty - so it thinks that it doesn't need to wake anybody. Lorenzo, does the problem go away for you if you remove the if (!list_empty(q-request_freelist[rw])) { ... } code from blkdev_release_request() in drivers/block/ll_rw_block.c? Yes, it does. -- Lorenzo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
On Sun, 28 Jan 2001, Jens Axboe wrote: How about this instead? I really don't like this one. It will basically re-introduce the old behaviour of waking people up in a trickle, as far as I can tell. The reason we want the batching is to make people have more requests to sort in the elevator, and as far as I can tell this will just hurt that. Are there any downsides to just _always_ batching, regardless of whether the request freelist is empty or not? Sure, it will make the "effective" size of the freelist a bit smaller, but that's probably not actually noticeable under any load except for the one that empties the freelist (in which case the old code would have triggered the batching anyway). Performance numbers? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
On Sun, Jan 28 2001, Linus Torvalds wrote: On Sun, 28 Jan 2001, Jens Axboe wrote: How about this instead? I really don't like this one. It will basically re-introduce the old behaviour of waking people up in a trickle, as far as I can tell. The reason we want the batching is to make people have more requests to sort in the elevator, and as far as I can tell this will just hurt that. Are there any downsides to just _always_ batching, regardless of whether the request freelist is empty or not? Sure, it will make the "effective" size of the freelist a bit smaller, but that's probably not actually noticeable under any load except for the one that empties the freelist (in which case the old code would have triggered the batching anyway). The problem with removing the !list_empty test like you suggested is that batching is no longer controlled anymore. If we start batching once the lists are empty and start wakeups once batch_requests has been reached, we know we'll give the elevator enough to work with to be effective. With !list_empty removed, batch_requests is no longer a measure of how many requests we want to batch. Always batching is not a in problem in itself, the effective smaller freelist effect should be neglible. The sent patch will only trickle wakeups in case of batching already in effect, but batch_request wakeups were not enough to deplete the freelist again. At least that was the intended effect :-) Performance numbers? Don't have any right now, will test a bit later. -- * Jens Axboe [EMAIL PROTECTED] * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
On Sun, 28 Jan 2001, Jens Axboe wrote: > On Sat, Jan 27 2001, Linus Torvalds wrote: > > > What was the trace of this? Just curious, the below case outlined by > > > Linus should be pretty generic, but I'd still like to know what > > > can lead to this condition. > > > > It was posted on linux-kernel - I don't save the dang things because I > > have too much in my "archives" as is ;) > > Ok I see it now, confused wrt the different threads... > > > > Good spotting. Actually I see one more problem with it too. If > > > we've started batching (under heavy I/O of course), we could > > > splice the pending list and wake up X number of sleepers, but > > > there's a) no guarentee that these sleepers will actually get > > > the requests if new ones keep flooding in > > > > (a) is ok. They'll go back to sleep - it's a loop waiting for requests.. > > My point is not that it's broken, but it will favor new comers > instead of tasks having blocked on a free slot already. So it > would still be nice to get right. > > > >and b) no guarentee > > > that X sleepers require X request slots. > > > > Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on > > them), they _will_ use a request. I don't think we have to worry about > > that. At most we will wake up "too many" - we'll wake up processes even > > though they end up not being able to get a request anyway because somebody > > else got to it first. And that's ok. It's the "wake up too few" that > > causes trouble, and I think that will be fixed by my suggestion. > > Yes they may end up sleeing right away again as per the above a) case > for instance. The logic now is 'we have X free slots now, wake up > x sleepers' where it instead should be 'we have X free slots now, > wake up people until the free list is exhausted'. > > > Now, I'd worred if somebody wants several requests at the same time, and > > doesn't feed them to the IO layer until it has gotten all of them. In that > > case, you can get starvation with many people having "reserved" their > > requests, and there not be enough free requests around to actually ever > > wake anybody up again. But the regular IO paths do not do this: they will > > all allocate a request and just submit it immediately, no "reservation". > > Right, the I/O path doesn't do this and it would seem more appropriate > to have such users use their own requests instead of eating from > the internal pool. > > -- > * Jens Axboe <[EMAIL PROTECTED]> > * SuSE Labs > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
On Sat, Jan 27 2001, Linus Torvalds wrote: > > What was the trace of this? Just curious, the below case outlined by > > Linus should be pretty generic, but I'd still like to know what > > can lead to this condition. > > It was posted on linux-kernel - I don't save the dang things because I > have too much in my "archives" as is ;) Ok I see it now, confused wrt the different threads... > > Good spotting. Actually I see one more problem with it too. If > > we've started batching (under heavy I/O of course), we could > > splice the pending list and wake up X number of sleepers, but > > there's a) no guarentee that these sleepers will actually get > > the requests if new ones keep flooding in > > (a) is ok. They'll go back to sleep - it's a loop waiting for requests.. My point is not that it's broken, but it will favor new comers instead of tasks having blocked on a free slot already. So it would still be nice to get right. > > and b) no guarentee > > that X sleepers require X request slots. > > Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on > them), they _will_ use a request. I don't think we have to worry about > that. At most we will wake up "too many" - we'll wake up processes even > though they end up not being able to get a request anyway because somebody > else got to it first. And that's ok. It's the "wake up too few" that > causes trouble, and I think that will be fixed by my suggestion. Yes they may end up sleeing right away again as per the above a) case for instance. The logic now is 'we have X free slots now, wake up x sleepers' where it instead should be 'we have X free slots now, wake up people until the free list is exhausted'. > Now, I'd worred if somebody wants several requests at the same time, and > doesn't feed them to the IO layer until it has gotten all of them. In that > case, you can get starvation with many people having "reserved" their > requests, and there not be enough free requests around to actually ever > wake anybody up again. But the regular IO paths do not do this: they will > all allocate a request and just submit it immediately, no "reservation". Right, the I/O path doesn't do this and it would seem more appropriate to have such users use their own requests instead of eating from the internal pool. -- * Jens Axboe <[EMAIL PROTECTED]> * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
On Sun, 28 Jan 2001, Jens Axboe wrote: > > > > So what happens is that somebody takes a page fault (and gets the mm > > lock), tries to read something in, and never gets anything back, thus > > leaving the MM locked. > > What was the trace of this? Just curious, the below case outlined by > Linus should be pretty generic, but I'd still like to know what > can lead to this condition. It was posted on linux-kernel - I don't save the dang things because I have too much in my "archives" as is ;) > > Lorenzo, does the problem go away for you if you remove the > > > > if (!list_empty(>request_freelist[rw])) { > > ... > > } > > > > code from blkdev_release_request() in drivers/block/ll_rw_block.c? > > Good spotting. Actually I see one more problem with it too. If > we've started batching (under heavy I/O of course), we could > splice the pending list and wake up X number of sleepers, but > there's a) no guarentee that these sleepers will actually get > the requests if new ones keep flooding in (a) is ok. They'll go back to sleep - it's a loop waiting for requests.. >and b) no guarentee > that X sleepers require X request slots. Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on them), they _will_ use a request. I don't think we have to worry about that. At most we will wake up "too many" - we'll wake up processes even though they end up not being able to get a request anyway because somebody else got to it first. And that's ok. It's the "wake up too few" that causes trouble, and I think that will be fixed by my suggestion. Now, I'd worred if somebody wants several requests at the same time, and doesn't feed them to the IO layer until it has gotten all of them. In that case, you can get starvation with many people having "reserved" their requests, and there not be enough free requests around to actually ever wake anybody up again. But the regular IO paths do not do this: they will all allocate a request and just submit it immediately, no "reservation". (Obviously, _submitting_ the request doesn't mean that we'd actually start processing it, but if somebody ends up waiting for requests they'll do the unplug that does start it all going, so effectively we can think of it as a logical "start this request now" thing even if it gets delayed in order to coalesce IO). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
On Sun, 28 Jan 2001, Jens Axboe wrote: So what happens is that somebody takes a page fault (and gets the mm lock), tries to read something in, and never gets anything back, thus leaving the MM locked. What was the trace of this? Just curious, the below case outlined by Linus should be pretty generic, but I'd still like to know what can lead to this condition. It was posted on linux-kernel - I don't save the dang things because I have too much in my "archives" as is ;) Lorenzo, does the problem go away for you if you remove the if (!list_empty(q-request_freelist[rw])) { ... } code from blkdev_release_request() in drivers/block/ll_rw_block.c? Good spotting. Actually I see one more problem with it too. If we've started batching (under heavy I/O of course), we could splice the pending list and wake up X number of sleepers, but there's a) no guarentee that these sleepers will actually get the requests if new ones keep flooding in (a) is ok. They'll go back to sleep - it's a loop waiting for requests.. and b) no guarentee that X sleepers require X request slots. Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on them), they _will_ use a request. I don't think we have to worry about that. At most we will wake up "too many" - we'll wake up processes even though they end up not being able to get a request anyway because somebody else got to it first. And that's ok. It's the "wake up too few" that causes trouble, and I think that will be fixed by my suggestion. Now, I'd worred if somebody wants several requests at the same time, and doesn't feed them to the IO layer until it has gotten all of them. In that case, you can get starvation with many people having "reserved" their requests, and there not be enough free requests around to actually ever wake anybody up again. But the regular IO paths do not do this: they will all allocate a request and just submit it immediately, no "reservation". (Obviously, _submitting_ the request doesn't mean that we'd actually start processing it, but if somebody ends up waiting for requests they'll do the unplug that does start it all going, so effectively we can think of it as a logical "start this request now" thing even if it gets delayed in order to coalesce IO). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
On Sat, Jan 27 2001, Linus Torvalds wrote: What was the trace of this? Just curious, the below case outlined by Linus should be pretty generic, but I'd still like to know what can lead to this condition. It was posted on linux-kernel - I don't save the dang things because I have too much in my "archives" as is ;) Ok I see it now, confused wrt the different threads... Good spotting. Actually I see one more problem with it too. If we've started batching (under heavy I/O of course), we could splice the pending list and wake up X number of sleepers, but there's a) no guarentee that these sleepers will actually get the requests if new ones keep flooding in (a) is ok. They'll go back to sleep - it's a loop waiting for requests.. My point is not that it's broken, but it will favor new comers instead of tasks having blocked on a free slot already. So it would still be nice to get right. and b) no guarentee that X sleepers require X request slots. Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on them), they _will_ use a request. I don't think we have to worry about that. At most we will wake up "too many" - we'll wake up processes even though they end up not being able to get a request anyway because somebody else got to it first. And that's ok. It's the "wake up too few" that causes trouble, and I think that will be fixed by my suggestion. Yes they may end up sleeing right away again as per the above a) case for instance. The logic now is 'we have X free slots now, wake up x sleepers' where it instead should be 'we have X free slots now, wake up people until the free list is exhausted'. Now, I'd worred if somebody wants several requests at the same time, and doesn't feed them to the IO layer until it has gotten all of them. In that case, you can get starvation with many people having "reserved" their requests, and there not be enough free requests around to actually ever wake anybody up again. But the regular IO paths do not do this: they will all allocate a request and just submit it immediately, no "reservation". Right, the I/O path doesn't do this and it would seem more appropriate to have such users use their own requests instead of eating from the internal pool. -- * Jens Axboe [EMAIL PROTECTED] * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
On Sun, 28 Jan 2001, Jens Axboe wrote: On Sat, Jan 27 2001, Linus Torvalds wrote: What was the trace of this? Just curious, the below case outlined by Linus should be pretty generic, but I'd still like to know what can lead to this condition. It was posted on linux-kernel - I don't save the dang things because I have too much in my "archives" as is ;) Ok I see it now, confused wrt the different threads... Good spotting. Actually I see one more problem with it too. If we've started batching (under heavy I/O of course), we could splice the pending list and wake up X number of sleepers, but there's a) no guarentee that these sleepers will actually get the requests if new ones keep flooding in (a) is ok. They'll go back to sleep - it's a loop waiting for requests.. My point is not that it's broken, but it will favor new comers instead of tasks having blocked on a free slot already. So it would still be nice to get right. and b) no guarentee that X sleepers require X request slots. Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on them), they _will_ use a request. I don't think we have to worry about that. At most we will wake up "too many" - we'll wake up processes even though they end up not being able to get a request anyway because somebody else got to it first. And that's ok. It's the "wake up too few" that causes trouble, and I think that will be fixed by my suggestion. Yes they may end up sleeing right away again as per the above a) case for instance. The logic now is 'we have X free slots now, wake up x sleepers' where it instead should be 'we have X free slots now, wake up people until the free list is exhausted'. Now, I'd worred if somebody wants several requests at the same time, and doesn't feed them to the IO layer until it has gotten all of them. In that case, you can get starvation with many people having "reserved" their requests, and there not be enough free requests around to actually ever wake anybody up again. But the regular IO paths do not do this: they will all allocate a request and just submit it immediately, no "reservation". Right, the I/O path doesn't do this and it would seem more appropriate to have such users use their own requests instead of eating from the internal pool. -- * Jens Axboe [EMAIL PROTECTED] * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/