Re: Linux 2.2.19pre2
On Sun, Dec 24, 2000 at 11:23:33AM +1100, Andrew Morton wrote: > ack. This patch against 2.2.19pre3 should fix all races. (note that wait->flags doesn't need to be initialized in the critical section in test1X too) ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.19pre3/wake-one-3 Comments? Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Sun, Dec 24, 2000 at 04:17:10PM +1100, Andrew Morton wrote: > I was talking about a different scenario: > > add_wait_queue_exclusive(>wait_for_request, ); > for (;;) { > __set_current_state(TASK_UNINTERRUPTIBLE); > /* WINDOW */ > spin_lock_irq(_request_lock); > rq = get_request(q, rw); > spin_unlock_irq(_request_lock); > if (rq) > break; > generic_unplug_device(q); > schedule(); > } > remove_wait_queue(>wait_for_request, ); > > Suppose there are two tasks sleeping in the schedule(). > > A wakeup comes. One task wakes. It loops aound and reaches > the window. At this point in time, another wakeup gets sent > to the waitqueue. It gets directed to the task which just > woke up![..] Ok, this is a very minor window compared to the current one, but yes, that could happen too in test4. > I assume this is because this waitqueue gets lots of wakeups sent to it. It only gets the strictly necessary number of wakeups. > Linus suggested at one point that we clear the waitqueue's > WQ_FLAG_EXCLUSIVE bit when we wake it up, [..] .. and then set it after checking if a new request is available, just before schedule(). That would avoid the above race (and the one I mentioned in previous email) but it doesn't address the lost wakeups for example when setting USE_RW_WAIT_QUEUE_SPINLOCK to 1. Considering wakeups only the ones that moves the task to the runqueue will get rid of the races all together and it looks right conceptually so I prefer it. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Sun, Dec 24, 2000 at 11:23:33AM +1100, Andrew Morton wrote: ack. This patch against 2.2.19pre3 should fix all races. (note that wait-flags doesn't need to be initialized in the critical section in test1X too) ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.19pre3/wake-one-3 Comments? Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
Andrea Arcangeli wrote: > > On Sun, Dec 24, 2000 at 01:28:59PM +1100, Andrew Morton wrote: > > This could happen with the old scheme where exclusiveness > > was stored in the task, not the waitqueue. > > > > >From test4: > > > > for (;;) { > > __set_current_state(TASK_UNINTERRUPTIBLE | TASK_EXCLUSIVE); > > /* WINDOW */ > > spin_lock_irq(_request_lock); > > rq = get_request(q, rw); > > spin_unlock_irq(_request_lock); > > if (rq) > > break; > > generic_unplug_device(q); > > schedule(); > > } > > > > As soon as we set TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE, this > > task becomes visible to __wake_up_common and can soak > > up a second wakeup. [..] > > I disagree, none wakeup could be lost in test4 in the above section of code > (besides the __set_current_state that should be set_current_state ;) and that's > not an issue for both x86 and alpha where spin_lock is a full memory barrier). > > Let's examine what happens in test4: > > [ snip ] I was talking about a different scenario: add_wait_queue_exclusive(>wait_for_request, ); for (;;) { __set_current_state(TASK_UNINTERRUPTIBLE); /* WINDOW */ spin_lock_irq(_request_lock); rq = get_request(q, rw); spin_unlock_irq(_request_lock); if (rq) break; generic_unplug_device(q); schedule(); } remove_wait_queue(>wait_for_request, ); Suppose there are two tasks sleeping in the schedule(). A wakeup comes. One task wakes. It loops aound and reaches the window. At this point in time, another wakeup gets sent to the waitqueue. It gets directed to the task which just woke up! It should have been directed to a sleeping task, not the current one which just *looks* like it's sleeping, because it's in state TASK_UNINTERRUPTIBLE. This can happen in test4. > This race is been introduced in test1X because there the task keeps asking for > an exclusive wakeup even after getting a wakeup: until it has the time to > cleanup and unregister explicitly. No, I think it's the same in test4 and test1X. In current kernels it's still the case that the woken task gets atomically switched into state TASK_RUNNING, and then becomes "hidden" from the wakeup code. The problem is, the task bogusly sets itself back into TASK_UNINTERRUPTIBLE for a very short period and becomes eligible for another wakeup. There's not much which ll_rw_blk.c can do about this. > And btw, 2.2.19pre3 has the same race, while the alternate wake-one > implementation in 2.2.18aa2 doesn't have it (for the same reasons). The above scenario can happen in all kernels. > And /* WINDOW */ is not the only window for such race: the window is the whole > section where the task is registered in the waitqueue in exclusive mode: > > add_wait_queue_exclusive > /* all is WINDOW here */ > remove_wait_queue > > Until remove_wait_queue runs explicitly in the task context any second wakeup > will keep waking up only such task (so in turn it will be lost). So it should > trigger very easily under high load since two wakeups will happen much faster > compared to the task that needs to be rescheduled in order run > remove_wait_queue and cleanup. > > Any deadlocks in test1X could be very well explained by this race condition. Yes, but I haven't seen a "stuck in D state" report for a while. I assume this is because this waitqueue gets lots of wakeups sent to it. The ones in networking I expect are protected by other locking which serialises the wakeups. The one in the x86 semaphores avoids it by sending an extra wakeup to the waitqueue on the way out. > > Still, changing the wakeup code in the way we've discussed > > seems the way to go. [..] > > I agree. I'm quite convinced it's right way too and of course I see why we > moved the exclusive information in the waitqueue instead of keeping it in the > task struct to provide sane semantics in the long run ;). Linus suggested at one point that we clear the waitqueue's WQ_FLAG_EXCLUSIVE bit when we wake it up, so it falls back to non-exclusive mode. This was to address one of the task-on-two-waitqueues problems... - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Sun, Dec 24, 2000 at 01:28:59PM +1100, Andrew Morton wrote: > This could happen with the old scheme where exclusiveness > was stored in the task, not the waitqueue. > > >From test4: > > for (;;) { > __set_current_state(TASK_UNINTERRUPTIBLE | TASK_EXCLUSIVE); > /* WINDOW */ > spin_lock_irq(_request_lock); > rq = get_request(q, rw); > spin_unlock_irq(_request_lock); > if (rq) > break; > generic_unplug_device(q); > schedule(); > } > > As soon as we set TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE, this > task becomes visible to __wake_up_common and can soak > up a second wakeup. [..] I disagree, none wakeup could be lost in test4 in the above section of code (besides the __set_current_state that should be set_current_state ;) and that's not an issue for both x86 and alpha where spin_lock is a full memory barrier). Let's examine what happens in test4: 1) before the `for' loop, the task is just visible to __wake_up_common 2) as soon as we set the task state to TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE the task _start_ asking for an exclusive wakeup from __wake_up_common At this point consider two different cases: 3a) assume we get a wakup in the /* WINDOW */: the task->state become runnable from under us (no problem, that's expected). And if it's true we got a wakeup in /* WINDOW */, then it means get_request will also return a valid rq and we'll break the loop and we'll use the resource. No wakeup lost in this case. So far so good. This was the 1 wakeup case. 3b) Assume _two_ wakeups happens in /* WINDOW */: the first one just make us RUNNABLE as in the 3a) case, the second one will wakeup any _other_ wake-one task _after_ us in the wait queue (normal FIFO order), because the first wakeup implicitly made us RUNNABLE (not anymore TASK_EXCLUSIVE so we were not asking anymore for an exclusive wakeup). The task after us may as well be running in /* WINDOW */ and in such case it will get as well a `rq' from get_request (as in case 3a) without having to sleep). All right. No wakeup could be lost in any case. Two wakeups happened and two tasks got their I/O request even if both wakeups happend in /* WINDOW */. Example with N wakeups and N tasks is equivalent. This race is been introduced in test1X because there the task keeps asking for an exclusive wakeup even after getting a wakeup: until it has the time to cleanup and unregister explicitly. And btw, 2.2.19pre3 has the same race, while the alternate wake-one implementation in 2.2.18aa2 doesn't have it (for the same reasons). And /* WINDOW */ is not the only window for such race: the window is the whole section where the task is registered in the waitqueue in exclusive mode: add_wait_queue_exclusive /* all is WINDOW here */ remove_wait_queue Until remove_wait_queue runs explicitly in the task context any second wakeup will keep waking up only such task (so in turn it will be lost). So it should trigger very easily under high load since two wakeups will happen much faster compared to the task that needs to be rescheduled in order run remove_wait_queue and cleanup. Any deadlocks in test1X could be very well explained by this race condition. > Still, changing the wakeup code in the way we've discussed > seems the way to go. [..] I agree. I'm quite convinced it's right way too and of course I see why we moved the exclusive information in the waitqueue instead of keeping it in the task struct to provide sane semantics in the long run ;). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
Andrea Arcangeli wrote: > > ... > > if (rq) > > break; > > generic_unplug_device(q); > > schedule(); > > } > > remove_wait_queue(>wait_for_request, ); > > current->state = TASK_RUNNING; > > return rq; > > } > > > > If this task enters the schedule() and is then woken, and another > > wakeup is sent to the waitqueue while this task is executing in > > the marked window, __wake_up_common() will try to wake this > > task a second time and will then stop looking for tasks to wake. > > > > The outcome: two wakeups sent to the queue, but only one task woken. > > Correct. > > And btw such race is new and it must been introduced in late 2.4.0-test1X or > so, I'm sure it couldn't happen in whole 2.3.x and 2.4.0-testX because the > wakeup was clearing atomically the exclusive bit from the task->state. > This could happen with the old scheme where exclusiveness was stored in the task, not the waitqueue. >From test4: for (;;) { __set_current_state(TASK_UNINTERRUPTIBLE | TASK_EXCLUSIVE); /* WINDOW */ spin_lock_irq(_request_lock); rq = get_request(q, rw); spin_unlock_irq(_request_lock); if (rq) break; generic_unplug_device(q); schedule(); } As soon as we set TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE, this task becomes visible to __wake_up_common and can soak up a second wakeup. I assume this hasn't been a reported problem because request queues get lots of wakeups sent to them? Still, changing the wakeup code in the way we've discussed seems the way to go. It also makes the extra wake_up() at the end of x86's __down() unnecessary, which is a small performance win - semaphores are currently wake-two. But Linus had a different reason why that wakeup was there. Need to dig out the email and stare at it. But I don't see a good reason to muck with the semaphores at this time. - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Sun, Dec 24, 2000 at 11:23:33AM +1100, Andrew Morton wrote: > Andrea Arcangeli wrote: > > 1) could be fixed trivially by making the waitqueue_lock a spinlock, but > > this way doesn't solve 2). And if we solve 2) properly than 1) gets fixed as BTW (follow up myself), really making the lock a spinlock (not a readwrite lock) would fix 2) as well (waitqueue_lock is global in 2.2.x I was thinking at the per-waitqueue lock of 2.4.x ;). > > well. > > I don't understand the problem with 2) in 2.2? Every task on both waitqueues > gets woken up. Won't it sort itself out OK? Not every task if it's a wake-one on both waitqueues. The problem should be the same in 2.2.x and 2.4.x. But if such usage makes sense is uncertain... > For 2.4, 2) is an issue because we can have tasks on two waitqueues at the > same time, with a mix of exclusive and non. Putting a global spinlock > into __wake_up_common would fix it, but was described as "horrid" by > you-know-who :) Yes. And that wouldn't fix the race number 3) below. > > I agree the right fix for 2) (and in turn for 1) ) is to count the number of > > exclusive wake_up_process that moves the task in the runqueue, if the task was > > just in the runqueue we must not consider it as an exclusive wakeup (so in turn > > we'll try again to wakeup the next exclusive-wakeup waiter). This will > > fix both races. Since the fix is self contained in __wake_up it's fine > > for 2.2.19pre3 as well and we can keep using a read_write lock then. > > I really like this approach. It fixes another problem in 2.4: > > Example: > > static struct request *__get_request_wait(request_queue_t *q, int rw) > { > register struct request *rq; > DECLARE_WAITQUEUE(wait, current); > > add_wait_queue_exclusive(>wait_for_request, ); > for (;;) { > __set_current_state(TASK_UNINTERRUPTIBLE); > /* WINDOW HERE */ > spin_lock_irq(_request_lock); > rq = get_request(q, rw); > spin_unlock_irq(_request_lock); note that the above is racy and can lose a wakeup, 2.4.x needs set_current_state there (not __set_current_state): spin_lock isn't a two-way barrier, it only forbids stuff ot exit the critical section. So on some architecture (not on the alpha for example) the cpu could reorder the code this way: spin_lock_irq() rq = get_request __set_current_state spin_unlock_irq So inverting the order of operations. That needs to be fixed too (luckily it's a one liner). > if (rq) > break; > generic_unplug_device(q); > schedule(); > } > remove_wait_queue(>wait_for_request, ); > current->state = TASK_RUNNING; > return rq; > } > > If this task enters the schedule() and is then woken, and another > wakeup is sent to the waitqueue while this task is executing in > the marked window, __wake_up_common() will try to wake this > task a second time and will then stop looking for tasks to wake. > > The outcome: two wakeups sent to the queue, but only one task woken. Correct. And btw such race is new and it must been introduced in late 2.4.0-test1X or so, I'm sure it couldn't happen in whole 2.3.x and 2.4.0-testX because the wakeup was clearing atomically the exclusive bit from the task->state. Still talking about late 2.4.x changes, why add_wait_queue_exclusive gone in kernel/fork.c?? That's obviously not the right place :). > I haven't thought about it super-hard, but I think that if > __wake_up_common's exclusive-mode handling were changed > as you describe, so that it keeps on scanning the queue until it has > *definitely* moved a task onto the runqueue then this > problem goes away. Yes, that's true. > > Those races of course are orthogonal with the issue we discussed previously > > in this thread: a task registered in two waitqueues and wanting an exclusive > > wakeup from one waitqueue and a wake-all from the other waitqueue (for > > addressing that we need to move the wake-one information from the task struct > > to the waitqueue_head and I still think that shoudln't be addressed in 2.2.x, > > 2.2.x is fine with a per-task-struct wake-one information) > > OK by me, as long as people don't uncautiously start using the > capability for other things. > > > Should I take care of the 2.2.x fix, or will you take care of it? I'm not using > > the wake-one patch in 2.2.19pre3 because I don't like it (starting from the > > useless wmb() in accept) so if you want to take care of 2.2.19pre3 yourself I'd > > suggest to apply the wake-one patch against 2.2.19pre3 in my ftp-patch area > > first. Otherwise give me an ack and I'll extend myself my wake-one patch to > > ignore the wake_up_process()es that doesn't move the task in the runqueue. > > ack. > > I'll take another look at the 2.4 patch and ask you to review that > when I've finished with the netdevice wetworks, if
Re: Linux 2.2.19pre2
Andrea Arcangeli wrote: > > On Sat, Dec 23, 2000 at 05:56:42PM +1100, Andrew Morton wrote: > > If we elect to not address this problem in 2.2 and to rely upon the network > > I see. There are two races: > > 1) race inside __wake_up when it's run on the same waitqueue: 2.2.19pre3 > is affected as well as 2.2.18aa2, and 2.4.x is affected > as well when #defining USE_RW_WAIT_QUEUE_SPINLOCK to 1. mm... I think we should kill the USE_RW_WAIT_QUEUE_SPINLOCK option. > 2) race between two parallel __wake_up running on different waitqueues > (both 2.2.x and 2.4.x are affected) > > 1) could be fixed trivially by making the waitqueue_lock a spinlock, but > this way doesn't solve 2). And if we solve 2) properly than 1) gets fixed as > well. I don't understand the problem with 2) in 2.2? Every task on both waitqueues gets woken up. Won't it sort itself out OK? For 2.4, 2) is an issue because we can have tasks on two waitqueues at the same time, with a mix of exclusive and non. Putting a global spinlock into __wake_up_common would fix it, but was described as "horrid" by you-know-who :) > I agree the right fix for 2) (and in turn for 1) ) is to count the number of > exclusive wake_up_process that moves the task in the runqueue, if the task was > just in the runqueue we must not consider it as an exclusive wakeup (so in turn > we'll try again to wakeup the next exclusive-wakeup waiter). This will > fix both races. Since the fix is self contained in __wake_up it's fine > for 2.2.19pre3 as well and we can keep using a read_write lock then. I really like this approach. It fixes another problem in 2.4: Example: static struct request *__get_request_wait(request_queue_t *q, int rw) { register struct request *rq; DECLARE_WAITQUEUE(wait, current); add_wait_queue_exclusive(>wait_for_request, ); for (;;) { __set_current_state(TASK_UNINTERRUPTIBLE); /* WINDOW HERE */ spin_lock_irq(_request_lock); rq = get_request(q, rw); spin_unlock_irq(_request_lock); if (rq) break; generic_unplug_device(q); schedule(); } remove_wait_queue(>wait_for_request, ); current->state = TASK_RUNNING; return rq; } If this task enters the schedule() and is then woken, and another wakeup is sent to the waitqueue while this task is executing in the marked window, __wake_up_common() will try to wake this task a second time and will then stop looking for tasks to wake. The outcome: two wakeups sent to the queue, but only one task woken. I haven't thought about it super-hard, but I think that if __wake_up_common's exclusive-mode handling were changed as you describe, so that it keeps on scanning the queue until it has *definitely* moved a task onto the runqueue then this problem goes away. > Those races of course are orthogonal with the issue we discussed previously > in this thread: a task registered in two waitqueues and wanting an exclusive > wakeup from one waitqueue and a wake-all from the other waitqueue (for > addressing that we need to move the wake-one information from the task struct > to the waitqueue_head and I still think that shoudln't be addressed in 2.2.x, > 2.2.x is fine with a per-task-struct wake-one information) OK by me, as long as people don't uncautiously start using the capability for other things. > Should I take care of the 2.2.x fix, or will you take care of it? I'm not using > the wake-one patch in 2.2.19pre3 because I don't like it (starting from the > useless wmb() in accept) so if you want to take care of 2.2.19pre3 yourself I'd > suggest to apply the wake-one patch against 2.2.19pre3 in my ftp-patch area > first. Otherwise give me an ack and I'll extend myself my wake-one patch to > ignore the wake_up_process()es that doesn't move the task in the runqueue. ack. I'll take another look at the 2.4 patch and ask you to review that when I've finished with the netdevice wetworks, if that's OK. - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Sat, Dec 23, 2000 at 05:56:42PM +1100, Andrew Morton wrote: > If we elect to not address this problem in 2.2 and to rely upon the network I see. There are two races: 1) race inside __wake_up when it's run on the same waitqueue: 2.2.19pre3 is affected as well as 2.2.18aa2, and 2.4.x is affected as well when #defining USE_RW_WAIT_QUEUE_SPINLOCK to 1. 2) race between two parallel __wake_up running on different waitqueues (both 2.2.x and 2.4.x are affected) 1) could be fixed trivially by making the waitqueue_lock a spinlock, but this way doesn't solve 2). And if we solve 2) properly than 1) gets fixed as well. I agree the right fix for 2) (and in turn for 1) ) is to count the number of exclusive wake_up_process that moves the task in the runqueue, if the task was just in the runqueue we must not consider it as an exclusive wakeup (so in turn we'll try again to wakeup the next exclusive-wakeup waiter). This will fix both races. Since the fix is self contained in __wake_up it's fine for 2.2.19pre3 as well and we can keep using a read_write lock then. Those races of course are orthogonal with the issue we discussed previously in this thread: a task registered in two waitqueues and wanting an exclusive wakeup from one waitqueue and a wake-all from the other waitqueue (for addressing that we need to move the wake-one information from the task struct to the waitqueue_head and I still think that shoudln't be addressed in 2.2.x, 2.2.x is fine with a per-task-struct wake-one information) Should I take care of the 2.2.x fix, or will you take care of it? I'm not using the wake-one patch in 2.2.19pre3 because I don't like it (starting from the useless wmb() in accept) so if you want to take care of 2.2.19pre3 yourself I'd suggest to apply the wake-one patch against 2.2.19pre3 in my ftp-patch area first. Otherwise give me an ack and I'll extend myself my wake-one patch to ignore the wake_up_process()es that doesn't move the task in the runqueue. Thanks, Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Sat, Dec 23, 2000 at 05:56:42PM +1100, Andrew Morton wrote: If we elect to not address this problem in 2.2 and to rely upon the network I see. There are two races: 1) race inside __wake_up when it's run on the same waitqueue: 2.2.19pre3 is affected as well as 2.2.18aa2, and 2.4.x is affected as well when #defining USE_RW_WAIT_QUEUE_SPINLOCK to 1. 2) race between two parallel __wake_up running on different waitqueues (both 2.2.x and 2.4.x are affected) 1) could be fixed trivially by making the waitqueue_lock a spinlock, but this way doesn't solve 2). And if we solve 2) properly than 1) gets fixed as well. I agree the right fix for 2) (and in turn for 1) ) is to count the number of exclusive wake_up_process that moves the task in the runqueue, if the task was just in the runqueue we must not consider it as an exclusive wakeup (so in turn we'll try again to wakeup the next exclusive-wakeup waiter). This will fix both races. Since the fix is self contained in __wake_up it's fine for 2.2.19pre3 as well and we can keep using a read_write lock then. Those races of course are orthogonal with the issue we discussed previously in this thread: a task registered in two waitqueues and wanting an exclusive wakeup from one waitqueue and a wake-all from the other waitqueue (for addressing that we need to move the wake-one information from the task struct to the waitqueue_head and I still think that shoudln't be addressed in 2.2.x, 2.2.x is fine with a per-task-struct wake-one information) Should I take care of the 2.2.x fix, or will you take care of it? I'm not using the wake-one patch in 2.2.19pre3 because I don't like it (starting from the useless wmb() in accept) so if you want to take care of 2.2.19pre3 yourself I'd suggest to apply the wake-one patch against 2.2.19pre3 in my ftp-patch area first. Otherwise give me an ack and I'll extend myself my wake-one patch to ignore the wake_up_process()es that doesn't move the task in the runqueue. Thanks, Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
Andrea Arcangeli wrote: On Sat, Dec 23, 2000 at 05:56:42PM +1100, Andrew Morton wrote: If we elect to not address this problem in 2.2 and to rely upon the network I see. There are two races: 1) race inside __wake_up when it's run on the same waitqueue: 2.2.19pre3 is affected as well as 2.2.18aa2, and 2.4.x is affected as well when #defining USE_RW_WAIT_QUEUE_SPINLOCK to 1. mm... I think we should kill the USE_RW_WAIT_QUEUE_SPINLOCK option. 2) race between two parallel __wake_up running on different waitqueues (both 2.2.x and 2.4.x are affected) 1) could be fixed trivially by making the waitqueue_lock a spinlock, but this way doesn't solve 2). And if we solve 2) properly than 1) gets fixed as well. I don't understand the problem with 2) in 2.2? Every task on both waitqueues gets woken up. Won't it sort itself out OK? For 2.4, 2) is an issue because we can have tasks on two waitqueues at the same time, with a mix of exclusive and non. Putting a global spinlock into __wake_up_common would fix it, but was described as "horrid" by you-know-who :) I agree the right fix for 2) (and in turn for 1) ) is to count the number of exclusive wake_up_process that moves the task in the runqueue, if the task was just in the runqueue we must not consider it as an exclusive wakeup (so in turn we'll try again to wakeup the next exclusive-wakeup waiter). This will fix both races. Since the fix is self contained in __wake_up it's fine for 2.2.19pre3 as well and we can keep using a read_write lock then. I really like this approach. It fixes another problem in 2.4: Example: static struct request *__get_request_wait(request_queue_t *q, int rw) { register struct request *rq; DECLARE_WAITQUEUE(wait, current); add_wait_queue_exclusive(q-wait_for_request, wait); for (;;) { __set_current_state(TASK_UNINTERRUPTIBLE); /* WINDOW HERE */ spin_lock_irq(io_request_lock); rq = get_request(q, rw); spin_unlock_irq(io_request_lock); if (rq) break; generic_unplug_device(q); schedule(); } remove_wait_queue(q-wait_for_request, wait); current-state = TASK_RUNNING; return rq; } If this task enters the schedule() and is then woken, and another wakeup is sent to the waitqueue while this task is executing in the marked window, __wake_up_common() will try to wake this task a second time and will then stop looking for tasks to wake. The outcome: two wakeups sent to the queue, but only one task woken. I haven't thought about it super-hard, but I think that if __wake_up_common's exclusive-mode handling were changed as you describe, so that it keeps on scanning the queue until it has *definitely* moved a task onto the runqueue then this problem goes away. Those races of course are orthogonal with the issue we discussed previously in this thread: a task registered in two waitqueues and wanting an exclusive wakeup from one waitqueue and a wake-all from the other waitqueue (for addressing that we need to move the wake-one information from the task struct to the waitqueue_head and I still think that shoudln't be addressed in 2.2.x, 2.2.x is fine with a per-task-struct wake-one information) OK by me, as long as people don't uncautiously start using the capability for other things. Should I take care of the 2.2.x fix, or will you take care of it? I'm not using the wake-one patch in 2.2.19pre3 because I don't like it (starting from the useless wmb() in accept) so if you want to take care of 2.2.19pre3 yourself I'd suggest to apply the wake-one patch against 2.2.19pre3 in my ftp-patch area first. Otherwise give me an ack and I'll extend myself my wake-one patch to ignore the wake_up_process()es that doesn't move the task in the runqueue. ack. I'll take another look at the 2.4 patch and ask you to review that when I've finished with the netdevice wetworks, if that's OK. - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Sun, Dec 24, 2000 at 11:23:33AM +1100, Andrew Morton wrote: Andrea Arcangeli wrote: 1) could be fixed trivially by making the waitqueue_lock a spinlock, but this way doesn't solve 2). And if we solve 2) properly than 1) gets fixed as BTW (follow up myself), really making the lock a spinlock (not a readwrite lock) would fix 2) as well (waitqueue_lock is global in 2.2.x I was thinking at the per-waitqueue lock of 2.4.x ;). well. I don't understand the problem with 2) in 2.2? Every task on both waitqueues gets woken up. Won't it sort itself out OK? Not every task if it's a wake-one on both waitqueues. The problem should be the same in 2.2.x and 2.4.x. But if such usage makes sense is uncertain... For 2.4, 2) is an issue because we can have tasks on two waitqueues at the same time, with a mix of exclusive and non. Putting a global spinlock into __wake_up_common would fix it, but was described as "horrid" by you-know-who :) Yes. And that wouldn't fix the race number 3) below. I agree the right fix for 2) (and in turn for 1) ) is to count the number of exclusive wake_up_process that moves the task in the runqueue, if the task was just in the runqueue we must not consider it as an exclusive wakeup (so in turn we'll try again to wakeup the next exclusive-wakeup waiter). This will fix both races. Since the fix is self contained in __wake_up it's fine for 2.2.19pre3 as well and we can keep using a read_write lock then. I really like this approach. It fixes another problem in 2.4: Example: static struct request *__get_request_wait(request_queue_t *q, int rw) { register struct request *rq; DECLARE_WAITQUEUE(wait, current); add_wait_queue_exclusive(q-wait_for_request, wait); for (;;) { __set_current_state(TASK_UNINTERRUPTIBLE); /* WINDOW HERE */ spin_lock_irq(io_request_lock); rq = get_request(q, rw); spin_unlock_irq(io_request_lock); note that the above is racy and can lose a wakeup, 2.4.x needs set_current_state there (not __set_current_state): spin_lock isn't a two-way barrier, it only forbids stuff ot exit the critical section. So on some architecture (not on the alpha for example) the cpu could reorder the code this way: spin_lock_irq() rq = get_request __set_current_state spin_unlock_irq So inverting the order of operations. That needs to be fixed too (luckily it's a one liner). if (rq) break; generic_unplug_device(q); schedule(); } remove_wait_queue(q-wait_for_request, wait); current-state = TASK_RUNNING; return rq; } If this task enters the schedule() and is then woken, and another wakeup is sent to the waitqueue while this task is executing in the marked window, __wake_up_common() will try to wake this task a second time and will then stop looking for tasks to wake. The outcome: two wakeups sent to the queue, but only one task woken. Correct. And btw such race is new and it must been introduced in late 2.4.0-test1X or so, I'm sure it couldn't happen in whole 2.3.x and 2.4.0-testX because the wakeup was clearing atomically the exclusive bit from the task-state. Still talking about late 2.4.x changes, why add_wait_queue_exclusive gone in kernel/fork.c?? That's obviously not the right place :). I haven't thought about it super-hard, but I think that if __wake_up_common's exclusive-mode handling were changed as you describe, so that it keeps on scanning the queue until it has *definitely* moved a task onto the runqueue then this problem goes away. Yes, that's true. Those races of course are orthogonal with the issue we discussed previously in this thread: a task registered in two waitqueues and wanting an exclusive wakeup from one waitqueue and a wake-all from the other waitqueue (for addressing that we need to move the wake-one information from the task struct to the waitqueue_head and I still think that shoudln't be addressed in 2.2.x, 2.2.x is fine with a per-task-struct wake-one information) OK by me, as long as people don't uncautiously start using the capability for other things. Should I take care of the 2.2.x fix, or will you take care of it? I'm not using the wake-one patch in 2.2.19pre3 because I don't like it (starting from the useless wmb() in accept) so if you want to take care of 2.2.19pre3 yourself I'd suggest to apply the wake-one patch against 2.2.19pre3 in my ftp-patch area first. Otherwise give me an ack and I'll extend myself my wake-one patch to ignore the wake_up_process()es that doesn't move the task in the runqueue. ack. I'll take another look at the 2.4 patch and ask you to review that when I've finished with the netdevice wetworks, if that's OK. OK. Thanks for the help. Andrea - To unsubscribe from this list: send the
Re: Linux 2.2.19pre2
Andrea Arcangeli wrote: ... if (rq) break; generic_unplug_device(q); schedule(); } remove_wait_queue(q-wait_for_request, wait); current-state = TASK_RUNNING; return rq; } If this task enters the schedule() and is then woken, and another wakeup is sent to the waitqueue while this task is executing in the marked window, __wake_up_common() will try to wake this task a second time and will then stop looking for tasks to wake. The outcome: two wakeups sent to the queue, but only one task woken. Correct. And btw such race is new and it must been introduced in late 2.4.0-test1X or so, I'm sure it couldn't happen in whole 2.3.x and 2.4.0-testX because the wakeup was clearing atomically the exclusive bit from the task-state. This could happen with the old scheme where exclusiveness was stored in the task, not the waitqueue. From test4: for (;;) { __set_current_state(TASK_UNINTERRUPTIBLE | TASK_EXCLUSIVE); /* WINDOW */ spin_lock_irq(io_request_lock); rq = get_request(q, rw); spin_unlock_irq(io_request_lock); if (rq) break; generic_unplug_device(q); schedule(); } As soon as we set TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE, this task becomes visible to __wake_up_common and can soak up a second wakeup. I assume this hasn't been a reported problem because request queues get lots of wakeups sent to them? Still, changing the wakeup code in the way we've discussed seems the way to go. It also makes the extra wake_up() at the end of x86's __down() unnecessary, which is a small performance win - semaphores are currently wake-two. But Linus had a different reason why that wakeup was there. Need to dig out the email and stare at it. But I don't see a good reason to muck with the semaphores at this time. - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Sun, Dec 24, 2000 at 01:28:59PM +1100, Andrew Morton wrote: This could happen with the old scheme where exclusiveness was stored in the task, not the waitqueue. From test4: for (;;) { __set_current_state(TASK_UNINTERRUPTIBLE | TASK_EXCLUSIVE); /* WINDOW */ spin_lock_irq(io_request_lock); rq = get_request(q, rw); spin_unlock_irq(io_request_lock); if (rq) break; generic_unplug_device(q); schedule(); } As soon as we set TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE, this task becomes visible to __wake_up_common and can soak up a second wakeup. [..] I disagree, none wakeup could be lost in test4 in the above section of code (besides the __set_current_state that should be set_current_state ;) and that's not an issue for both x86 and alpha where spin_lock is a full memory barrier). Let's examine what happens in test4: 1) before the `for' loop, the task is just visible to __wake_up_common 2) as soon as we set the task state to TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE the task _start_ asking for an exclusive wakeup from __wake_up_common At this point consider two different cases: 3a) assume we get a wakup in the /* WINDOW */: the task-state become runnable from under us (no problem, that's expected). And if it's true we got a wakeup in /* WINDOW */, then it means get_request will also return a valid rq and we'll break the loop and we'll use the resource. No wakeup lost in this case. So far so good. This was the 1 wakeup case. 3b) Assume _two_ wakeups happens in /* WINDOW */: the first one just make us RUNNABLE as in the 3a) case, the second one will wakeup any _other_ wake-one task _after_ us in the wait queue (normal FIFO order), because the first wakeup implicitly made us RUNNABLE (not anymore TASK_EXCLUSIVE so we were not asking anymore for an exclusive wakeup). The task after us may as well be running in /* WINDOW */ and in such case it will get as well a `rq' from get_request (as in case 3a) without having to sleep). All right. No wakeup could be lost in any case. Two wakeups happened and two tasks got their I/O request even if both wakeups happend in /* WINDOW */. Example with N wakeups and N tasks is equivalent. This race is been introduced in test1X because there the task keeps asking for an exclusive wakeup even after getting a wakeup: until it has the time to cleanup and unregister explicitly. And btw, 2.2.19pre3 has the same race, while the alternate wake-one implementation in 2.2.18aa2 doesn't have it (for the same reasons). And /* WINDOW */ is not the only window for such race: the window is the whole section where the task is registered in the waitqueue in exclusive mode: add_wait_queue_exclusive /* all is WINDOW here */ remove_wait_queue Until remove_wait_queue runs explicitly in the task context any second wakeup will keep waking up only such task (so in turn it will be lost). So it should trigger very easily under high load since two wakeups will happen much faster compared to the task that needs to be rescheduled in order run remove_wait_queue and cleanup. Any deadlocks in test1X could be very well explained by this race condition. Still, changing the wakeup code in the way we've discussed seems the way to go. [..] I agree. I'm quite convinced it's right way too and of course I see why we moved the exclusive information in the waitqueue instead of keeping it in the task struct to provide sane semantics in the long run ;). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
Andrea Arcangeli wrote: On Sun, Dec 24, 2000 at 01:28:59PM +1100, Andrew Morton wrote: This could happen with the old scheme where exclusiveness was stored in the task, not the waitqueue. From test4: for (;;) { __set_current_state(TASK_UNINTERRUPTIBLE | TASK_EXCLUSIVE); /* WINDOW */ spin_lock_irq(io_request_lock); rq = get_request(q, rw); spin_unlock_irq(io_request_lock); if (rq) break; generic_unplug_device(q); schedule(); } As soon as we set TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE, this task becomes visible to __wake_up_common and can soak up a second wakeup. [..] I disagree, none wakeup could be lost in test4 in the above section of code (besides the __set_current_state that should be set_current_state ;) and that's not an issue for both x86 and alpha where spin_lock is a full memory barrier). Let's examine what happens in test4: [ snip ] I was talking about a different scenario: add_wait_queue_exclusive(q-wait_for_request, wait); for (;;) { __set_current_state(TASK_UNINTERRUPTIBLE); /* WINDOW */ spin_lock_irq(io_request_lock); rq = get_request(q, rw); spin_unlock_irq(io_request_lock); if (rq) break; generic_unplug_device(q); schedule(); } remove_wait_queue(q-wait_for_request, wait); Suppose there are two tasks sleeping in the schedule(). A wakeup comes. One task wakes. It loops aound and reaches the window. At this point in time, another wakeup gets sent to the waitqueue. It gets directed to the task which just woke up! It should have been directed to a sleeping task, not the current one which just *looks* like it's sleeping, because it's in state TASK_UNINTERRUPTIBLE. This can happen in test4. This race is been introduced in test1X because there the task keeps asking for an exclusive wakeup even after getting a wakeup: until it has the time to cleanup and unregister explicitly. No, I think it's the same in test4 and test1X. In current kernels it's still the case that the woken task gets atomically switched into state TASK_RUNNING, and then becomes "hidden" from the wakeup code. The problem is, the task bogusly sets itself back into TASK_UNINTERRUPTIBLE for a very short period and becomes eligible for another wakeup. There's not much which ll_rw_blk.c can do about this. And btw, 2.2.19pre3 has the same race, while the alternate wake-one implementation in 2.2.18aa2 doesn't have it (for the same reasons). The above scenario can happen in all kernels. And /* WINDOW */ is not the only window for such race: the window is the whole section where the task is registered in the waitqueue in exclusive mode: add_wait_queue_exclusive /* all is WINDOW here */ remove_wait_queue Until remove_wait_queue runs explicitly in the task context any second wakeup will keep waking up only such task (so in turn it will be lost). So it should trigger very easily under high load since two wakeups will happen much faster compared to the task that needs to be rescheduled in order run remove_wait_queue and cleanup. Any deadlocks in test1X could be very well explained by this race condition. Yes, but I haven't seen a "stuck in D state" report for a while. I assume this is because this waitqueue gets lots of wakeups sent to it. The ones in networking I expect are protected by other locking which serialises the wakeups. The one in the x86 semaphores avoids it by sending an extra wakeup to the waitqueue on the way out. Still, changing the wakeup code in the way we've discussed seems the way to go. [..] I agree. I'm quite convinced it's right way too and of course I see why we moved the exclusive information in the waitqueue instead of keeping it in the task struct to provide sane semantics in the long run ;). Linus suggested at one point that we clear the waitqueue's WQ_FLAG_EXCLUSIVE bit when we wake it up, so it falls back to non-exclusive mode. This was to address one of the task-on-two-waitqueues problems... - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
Andrea Arcangeli wrote: > > [ 2.2 waitqueue stuff ] > Andrea, it occurs to me... __wake_up() is using read_lock(waitqueue_lock). This means that two CPUs could simultaneously run __wake_up(). They could both find the same task and they could both try to wake it up. The net result would be two wakeup events, but only one task woken up. That's a lost wakeup. Now, it's probably the case that the 2.2 network serialisation prevents this from happening - I haven't looked. If not, then we need to use spin_lock_irqsave. Alternatively, we can teach wake_up_process to return a success value if it successfully moved a task onto the runqueue. Test that in __wake_up and keep on going if it's zero. 2.4 needs this as well, BTW, to fix an SMP race where a task is on two waitqueues at the same time. I did a patch which took the "wake_up_process returns success value" approach and it felt clean. I haven't submitted it due to general end-of-year patch exhaustion :) If we elect to not address this problem in 2.2 and to rely upon the network locking then we need to put a BIG FAT warning in the code somewhere, because if someone else tries to use the new wake-one capability, they may not be so lucky. - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Fri, Dec 22, 2000 at 06:33:00PM +1100, Andrew Morton wrote: > add_waitqueue_exclusive() and TASK_EXCLUSIVE, add a There's no add_waitqueue_exclusive in my patch. > Except for this bit, which looks slightly fatal: > > /* > * We can drop the read-lock early if this > * is the only/last process. > */ > if (next == head) { > read_unlock(_lock); > wake_up_process(p); > goto out; > } > > Once the waitqueue_lock has been dropped, the task at `p' > is free to remove itself from the waitqueue and exit. This > CPU can then try to wake up a non-existent task, no? Yes, that was an unlikely-to-happen SMP race I inerith from 2.2.18 and all previous 2.2.x vanilla kernels. Thanks. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Fri, Dec 22, 2000 at 06:33:00PM +1100, Andrew Morton wrote: add_waitqueue_exclusive() and TASK_EXCLUSIVE, add a There's no add_waitqueue_exclusive in my patch. Except for this bit, which looks slightly fatal: /* * We can drop the read-lock early if this * is the only/last process. */ if (next == head) { read_unlock(waitqueue_lock); wake_up_process(p); goto out; } Once the waitqueue_lock has been dropped, the task at `p' is free to remove itself from the waitqueue and exit. This CPU can then try to wake up a non-existent task, no? Yes, that was an unlikely-to-happen SMP race I inerith from 2.2.18 and all previous 2.2.x vanilla kernels. Thanks. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
Andrea Arcangeli wrote: [ 2.2 waitqueue stuff ] Andrea, it occurs to me... __wake_up() is using read_lock(waitqueue_lock). This means that two CPUs could simultaneously run __wake_up(). They could both find the same task and they could both try to wake it up. The net result would be two wakeup events, but only one task woken up. That's a lost wakeup. Now, it's probably the case that the 2.2 network serialisation prevents this from happening - I haven't looked. If not, then we need to use spin_lock_irqsave. Alternatively, we can teach wake_up_process to return a success value if it successfully moved a task onto the runqueue. Test that in __wake_up and keep on going if it's zero. 2.4 needs this as well, BTW, to fix an SMP race where a task is on two waitqueues at the same time. I did a patch which took the "wake_up_process returns success value" approach and it felt clean. I haven't submitted it due to general end-of-year patch exhaustion :) If we elect to not address this problem in 2.2 and to rely upon the network locking then we need to put a BIG FAT warning in the code somewhere, because if someone else tries to use the new wake-one capability, they may not be so lucky. - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
Andrea Arcangeli wrote: > > > > Other thing about your patch, adding TASK_EXCLUSIVE to > > > wake_up/wake_up_interruptible is useless. > > > > This enables wake_up_all(). > > It is useless as it is in 2.2.19pre2: there's no wake_up_all in 2.2.19pre2. #define wake_up_all(x) __wake_up((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE) > > Anyway, this is all just noise. > > > > The key question is: which of the following do we want? > > > > a) A simple, specific accept()-accelerator, and 2.2 remains without > >an exclusive wq API or > > To make the accellerator we need a minimal wake-one support. So a) doesn't > make sense to me. It makes heaps of sense. We've introduced into 2.2 an API which has the same appearance as one in 2.4, but which is subtly broken wrt the 2.4 one. I suggest you change the names to something other than add_waitqueue_exclusive() and TASK_EXCLUSIVE, add a cautionary comment and then go ahead with your patch. Except for this bit, which looks slightly fatal: /* * We can drop the read-lock early if this * is the only/last process. */ if (next == head) { read_unlock(_lock); wake_up_process(p); goto out; } Once the waitqueue_lock has been dropped, the task at `p' is free to remove itself from the waitqueue and exit. This CPU can then try to wake up a non-existent task, no? - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Wed, Dec 20, 2000 at 02:28:58PM +0100, Andrea Arcangeli wrote: > I was in the process of fixing this (I also just backported the thinkpad > %edx clobber fix), but if somebody is going to work on this please let > me know so we stay in sync. Ok this should fix the e820 memory detection, against 2.2.19pre2: ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.19pre2/e820-fix-1 While fixing the code I noticed some bug was inerith from 2.4.x so I forward ported the fixes to 2.4.0-test13-pre3: ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test13-pre3/e820-fix-1 I also include them below so they're handy for Linus: diff -urN 2.4.0-test13-pre3/arch/i386/kernel/setup.c 2.4.0-test13-pre3-e820/arch/i386/kernel/setup.c --- 2.4.0-test13-pre3/arch/i386/kernel/setup.c Thu Dec 14 22:33:59 2000 +++ 2.4.0-test13-pre3-e820/arch/i386/kernel/setup.c Thu Dec 21 21:12:47 2000 @@ -477,7 +477,7 @@ if (start < 0x10ULL && end > 0xAULL) { if (start < 0xAULL) add_memory_region(start, 0xAULL-start, type); - if (end < 0x10ULL) + if (end <= 0x10ULL) continue; start = 0x10ULL; size = end - start; @@ -518,7 +518,8 @@ e820.nr_map = 0; add_memory_region(0, LOWMEMSIZE(), E820_RAM); - add_memory_region(HIGH_MEMORY, mem_size << 10, E820_RAM); + add_memory_region(HIGH_MEMORY, (mem_size << 10)-HIGH_MEMORY, + E820_RAM); } printk("BIOS-provided physical RAM map:\n"); print_memory_map(who); The above patches doesn't include the fix for the thinkpad from Marc Joosen, a backport of such bugfix is separately backported here (because it's orthogonal with the other bugfixes): ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.19pre2/thinkpad-e820-mjoosen-1 Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Thu, 21 Dec 2000, Andrea Arcangeli wrote: > It would also be nice if you could show a real life > showstopper-production-bottleneck where we need C) to fix it. I > cannot see any useful usage of C in production 2.2.x. Me neither. I'm just wondering at the reason why 2.2 semantics would be different than 2.4 ones now we still have the choice of just cut'n'pasting over the (working) code from 2.4... regards, Rik -- Hollywood goes for world dumbination, Trailer at 11. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Thu, Dec 21, 2000 at 03:07:08PM -0200, Rik van Riel wrote: > c) will also implement a) in an obviously right and simple way. So go ahead. If you think that's so simple and obviously right you can post here a patch here against 2.2.19pre2 that implements C) to show real facts. My B is here: ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.19pre2/wake-one-2 Then we will see how much C) is obviously right and simple way compared to B). I don't need to see C) implemented to see how much it's obviously right and simple but if you think I'm wrong again: go ahead. It would also be nice if you could show a real life showstopper-production-bottleneck where we need C) to fix it. I cannot see any useful usage of C in production 2.2.x. Doing waitqueues in 2.2.x and 2.4.x is an irrelevant point (keeping the same API and semantics is much better than anything else for 2.2.x unless there's some serious showstopper that isn't possible to fix with B) and that I still cannot see). People backporting drivers from 2.4.x will use wake-all as they had to do during the whole 2.3.x, that's obviously safe and trivial. If they know what they're doing they can also use the 2.2.x wake-one API if their task is registered only in 1 waitqueues (as 99% of usages I'm aware of given whole 2.3.x implemented B too). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Thu, 21 Dec 2000, Andrea Arcangeli wrote: > > The key question is: which of the following do we want? > > > > a) A simple, specific accept()-accelerator, and 2.2 remains without > >an exclusive wq API or > > To make the accellerator we need a minimal wake-one support. So a) doesn't > make sense to me. > > > b) A general purpose exclusive wq mechanism which does not correctly > >support waiting on two queues simultaneuously where one is > >exclusive or > > That's what we had in whole 2.3.x and that is better suitable for 2.2.x > as it allows to do a) with obviously right approch and minimal effort. > > > c) A general purpose exclusive wq mechanism which _does_ support it. > > > > Each choice has merit! You seem to want b). davem wants c). > > I have not read any email from DaveM who proposes C for > 2.2.19pre3 (or 2.2.x in general). Are you sure he wasn't talking > about 2.4.x? c) will also implement a) in an obviously right and simple way. I've still not seen ANY reason why we'd want 2.2 to have different wake-one semantics from 2.4... > > And given that 2.2 has maybe 2-4 years life left in it, I'd > > I hope no ;). People are still converting their 2.0 systems to 2.2 as we speak. I really doubt that 2.2 has _less_ than 3 years of life left ... as much as I hate this idea ;) > > agree with David. Let's do it once and do it right while the issue > > is fresh in our minds. > > I disagree. The changes to separate the two waitqueues even only > for accept are not suitable for 2.2.x. Alan filter should forbid > that changes. They're simply not worthwhile because I cannot > even think at ... They're not worthwhile just because you can't think of an example ? The same could be said of `b)' above. Do you have an example where that is the preferable semantics ? If both are equally preferable (ie. nobody can think of any example where the corner case is being used), why do you keep insisting on giving 2.2 different wake-one semantics from 2.4 ? [these wake-one semantics may become rather important for people backporting drivers to 2.2 over the next years ... or to something else which nobody has even thought about ... 2-4 years is a long time, so 2.2 maintainability is still an issue] regards, Rik -- Hollywood goes for world dumbination, Trailer at 11. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Thu, Dec 21, 2000 at 09:38:43PM +1100, Andrew Morton wrote: > Andrea Arcangeli wrote: > > > > The fact you could mix non-exclusive and exlusive wakeups in the same waitqueue > > was a feature not a misfeature. Then of course you cannot register in two > > waitqueues one with wake-one and one with wake-all but who does that anyways? > > Definitely not an issue for 2.2.x. > > Definitely? Let's think about that. If you can tell one showstopper bottleneck in 2.2.x that needs to register in two waitqueues at the same time, one exclusive and one non-exclusive, then I will think about it. > > I think the real reason for spearating the two things as davem proposed is > > because otherwise we cannot register for a LIFO wake-one in O(1) as we needed > > for accept. > > Yes. In other words, if we try to do O(1) LIFO, we can cause lost wakeups. We can do LIFO in O(N) just fine, no lost wakeups with the "mix" way. Only problem is additional complexity in the inserction operation. > > Other thing about your patch, adding TASK_EXCLUSIVE to > > wake_up/wake_up_interruptible is useless. > > This enables wake_up_all(). It is useless as it is in 2.2.19pre2: there's no wake_up_all in 2.2.19pre2. > Anyway, this is all just noise. > > The key question is: which of the following do we want? > > a) A simple, specific accept()-accelerator, and 2.2 remains without >an exclusive wq API or To make the accellerator we need a minimal wake-one support. So a) doesn't make sense to me. > b) A general purpose exclusive wq mechanism which does not correctly >support waiting on two queues simultaneuously where one is >exclusive or That's what we had in whole 2.3.x and that is better suitable for 2.2.x as it allows to do a) with obviously right approch and minimal effort. > c) A general purpose exclusive wq mechanism which _does_ support it. > > Each choice has merit! You seem to want b). davem wants c). I have not read any email from DaveM who proposes C for 2.2.19pre3 (or 2.2.x in general). Are you sure he wasn't talking about 2.4.x? BTW, you also implemented b) in 2.2.19pre2 ;) > And given that 2.2 has maybe 2-4 years life left in it, I'd I hope no ;). > agree with David. Let's do it once and do it right while the issue > is fresh in our minds. I disagree. The changes to separate the two waitqueues even only for accept are not suitable for 2.2.x. Alan filter should forbid that changes. They're simply not worthwhile because I cannot even think at one bottleneck-showstopper place that registers in two waitqueues and wants wake-one from one and wake-all from the other in 2.2.x. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
Andrea Arcangeli wrote: > > The fact you could mix non-exclusive and exlusive wakeups in the same waitqueue > was a feature not a misfeature. Then of course you cannot register in two > waitqueues one with wake-one and one with wake-all but who does that anyways? > Definitely not an issue for 2.2.x. Definitely? Let's think about that. > I think the real reason for spearating the two things as davem proposed is > because otherwise we cannot register for a LIFO wake-one in O(1) as we needed > for accept. Yes. In other words, if we try to do O(1) LIFO, we can cause lost wakeups. > Other thing about your patch, adding TASK_EXCLUSIVE to > wake_up/wake_up_interruptible is useless. This enables wake_up_all(). Anyway, this is all just noise. The key question is: which of the following do we want? a) A simple, specific accept()-accelerator, and 2.2 remains without an exclusive wq API or b) A general purpose exclusive wq mechanism which does not correctly support waiting on two queues simultaneuously where one is exclusive or c) A general purpose exclusive wq mechanism which _does_ support it. Each choice has merit! You seem to want b). davem wants c). And given that 2.2 has maybe 2-4 years life left in it, I'd agree with David. Let's do it once and do it right while the issue is fresh in our minds. Yes? - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
Andrea Arcangeli wrote: The fact you could mix non-exclusive and exlusive wakeups in the same waitqueue was a feature not a misfeature. Then of course you cannot register in two waitqueues one with wake-one and one with wake-all but who does that anyways? Definitely not an issue for 2.2.x. Definitely? Let's think about that. I think the real reason for spearating the two things as davem proposed is because otherwise we cannot register for a LIFO wake-one in O(1) as we needed for accept. Yes. In other words, if we try to do O(1) LIFO, we can cause lost wakeups. Other thing about your patch, adding TASK_EXCLUSIVE to wake_up/wake_up_interruptible is useless. This enables wake_up_all(). Anyway, this is all just noise. The key question is: which of the following do we want? a) A simple, specific accept()-accelerator, and 2.2 remains without an exclusive wq API or b) A general purpose exclusive wq mechanism which does not correctly support waiting on two queues simultaneuously where one is exclusive or c) A general purpose exclusive wq mechanism which _does_ support it. Each choice has merit! You seem to want b). davem wants c). And given that 2.2 has maybe 2-4 years life left in it, I'd agree with David. Let's do it once and do it right while the issue is fresh in our minds. Yes? - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Thu, Dec 21, 2000 at 09:38:43PM +1100, Andrew Morton wrote: Andrea Arcangeli wrote: The fact you could mix non-exclusive and exlusive wakeups in the same waitqueue was a feature not a misfeature. Then of course you cannot register in two waitqueues one with wake-one and one with wake-all but who does that anyways? Definitely not an issue for 2.2.x. Definitely? Let's think about that. If you can tell one showstopper bottleneck in 2.2.x that needs to register in two waitqueues at the same time, one exclusive and one non-exclusive, then I will think about it. I think the real reason for spearating the two things as davem proposed is because otherwise we cannot register for a LIFO wake-one in O(1) as we needed for accept. Yes. In other words, if we try to do O(1) LIFO, we can cause lost wakeups. We can do LIFO in O(N) just fine, no lost wakeups with the "mix" way. Only problem is additional complexity in the inserction operation. Other thing about your patch, adding TASK_EXCLUSIVE to wake_up/wake_up_interruptible is useless. This enables wake_up_all(). It is useless as it is in 2.2.19pre2: there's no wake_up_all in 2.2.19pre2. Anyway, this is all just noise. The key question is: which of the following do we want? a) A simple, specific accept()-accelerator, and 2.2 remains without an exclusive wq API or To make the accellerator we need a minimal wake-one support. So a) doesn't make sense to me. b) A general purpose exclusive wq mechanism which does not correctly support waiting on two queues simultaneuously where one is exclusive or That's what we had in whole 2.3.x and that is better suitable for 2.2.x as it allows to do a) with obviously right approch and minimal effort. c) A general purpose exclusive wq mechanism which _does_ support it. Each choice has merit! You seem to want b). davem wants c). I have not read any email from DaveM who proposes C for 2.2.19pre3 (or 2.2.x in general). Are you sure he wasn't talking about 2.4.x? BTW, you also implemented b) in 2.2.19pre2 ;) And given that 2.2 has maybe 2-4 years life left in it, I'd I hope no ;). agree with David. Let's do it once and do it right while the issue is fresh in our minds. I disagree. The changes to separate the two waitqueues even only for accept are not suitable for 2.2.x. Alan filter should forbid that changes. They're simply not worthwhile because I cannot even think at one bottleneck-showstopper place that registers in two waitqueues and wants wake-one from one and wake-all from the other in 2.2.x. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Thu, 21 Dec 2000, Andrea Arcangeli wrote: The key question is: which of the following do we want? a) A simple, specific accept()-accelerator, and 2.2 remains without an exclusive wq API or To make the accellerator we need a minimal wake-one support. So a) doesn't make sense to me. b) A general purpose exclusive wq mechanism which does not correctly support waiting on two queues simultaneuously where one is exclusive or That's what we had in whole 2.3.x and that is better suitable for 2.2.x as it allows to do a) with obviously right approch and minimal effort. c) A general purpose exclusive wq mechanism which _does_ support it. Each choice has merit! You seem to want b). davem wants c). I have not read any email from DaveM who proposes C for 2.2.19pre3 (or 2.2.x in general). Are you sure he wasn't talking about 2.4.x? c) will also implement a) in an obviously right and simple way. I've still not seen ANY reason why we'd want 2.2 to have different wake-one semantics from 2.4... And given that 2.2 has maybe 2-4 years life left in it, I'd I hope no ;). People are still converting their 2.0 systems to 2.2 as we speak. I really doubt that 2.2 has _less_ than 3 years of life left ... as much as I hate this idea ;) agree with David. Let's do it once and do it right while the issue is fresh in our minds. I disagree. The changes to separate the two waitqueues even only for accept are not suitable for 2.2.x. Alan filter should forbid that changes. They're simply not worthwhile because I cannot even think at ... They're not worthwhile just because you can't think of an example ? The same could be said of `b)' above. Do you have an example where that is the preferable semantics ? If both are equally preferable (ie. nobody can think of any example where the corner case is being used), why do you keep insisting on giving 2.2 different wake-one semantics from 2.4 ? [these wake-one semantics may become rather important for people backporting drivers to 2.2 over the next years ... or to something else which nobody has even thought about ... 2-4 years is a long time, so 2.2 maintainability is still an issue] regards, Rik -- Hollywood goes for world dumbination, Trailer at 11. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Thu, Dec 21, 2000 at 03:07:08PM -0200, Rik van Riel wrote: c) will also implement a) in an obviously right and simple way. So go ahead. If you think that's so simple and obviously right you can post here a patch here against 2.2.19pre2 that implements C) to show real facts. My B is here: ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.19pre2/wake-one-2 Then we will see how much C) is obviously right and simple way compared to B). I don't need to see C) implemented to see how much it's obviously right and simple but if you think I'm wrong again: go ahead. It would also be nice if you could show a real life showstopper-production-bottleneck where we need C) to fix it. I cannot see any useful usage of C in production 2.2.x. Doing waitqueues in 2.2.x and 2.4.x is an irrelevant point (keeping the same API and semantics is much better than anything else for 2.2.x unless there's some serious showstopper that isn't possible to fix with B) and that I still cannot see). People backporting drivers from 2.4.x will use wake-all as they had to do during the whole 2.3.x, that's obviously safe and trivial. If they know what they're doing they can also use the 2.2.x wake-one API if their task is registered only in 1 waitqueues (as 99% of usages I'm aware of given whole 2.3.x implemented B too). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
Andrea Arcangeli wrote: Other thing about your patch, adding TASK_EXCLUSIVE to wake_up/wake_up_interruptible is useless. This enables wake_up_all(). It is useless as it is in 2.2.19pre2: there's no wake_up_all in 2.2.19pre2. #define wake_up_all(x) __wake_up((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE) Anyway, this is all just noise. The key question is: which of the following do we want? a) A simple, specific accept()-accelerator, and 2.2 remains without an exclusive wq API or To make the accellerator we need a minimal wake-one support. So a) doesn't make sense to me. It makes heaps of sense. We've introduced into 2.2 an API which has the same appearance as one in 2.4, but which is subtly broken wrt the 2.4 one. I suggest you change the names to something other than add_waitqueue_exclusive() and TASK_EXCLUSIVE, add a cautionary comment and then go ahead with your patch. Except for this bit, which looks slightly fatal: /* * We can drop the read-lock early if this * is the only/last process. */ if (next == head) { read_unlock(waitqueue_lock); wake_up_process(p); goto out; } Once the waitqueue_lock has been dropped, the task at `p' is free to remove itself from the waitqueue and exit. This CPU can then try to wake up a non-existent task, no? - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Wed, Dec 20, 2000 at 03:48:06PM -0200, Rik van Riel wrote: > On Wed, 20 Dec 2000, Andrea Arcangeli wrote: > > On Thu, Dec 21, 2000 at 01:57:15AM +1100, Andrew Morton wrote: > > > If a task is on two waitqueues at the same time it becomes a bug: > > > if the outer waitqueue is non-exclusive and the inner is exclusive, > > > > Your 2.2.x won't allow that either. You set the > > `current->task_exclusive = 1' and so you will get an exclusive > > wakeups in both waitqueues. You simply cannot tell register in > > two waitqueue and expect a non-exlusive wakeup in one and an > > exclusive wakeup in the other one. > > Why not? Having a wake-all wakeup on one event and a > wake-one wakeup on another kind of event seems perfectly > reasonable to me at a first glance. Is there something > I've overlooked ? I think you overlooked, never mind. I only said kernel 2.2.19pre2 won't allow that either. I'm not saying it's impossible to implement or unrasonable. > > > Anyway, it's academic. davem would prefer that we do it properly > > > and move the `exclusive' flag into the waitqueue head to avoid the > > > task-on-two-waitqueues problem, as was done in 2.4. I think he's > > > > The fact you could mix non-exclusive and exlusive wakeups in the > > same waitqueue was a feature not a misfeature. Then of course > > you cannot register in two waitqueues one with wake-one and one > > with wake-all but who does that anyways? > > The "who does that anyways" argument could also be said about > mixing exclusive and non-exclusive wakeups in the same waitqueue. > Doing this seems rather confusing ... would you know any application > which needs this ? wake-one accept(2) in 2.2.x unless you want to rewrite part of the TCP stack to still get select wake-all right (the reason they was mixed in whole 2.3.x was the same reason we _need_ to somehow mix non-excusive and exlusive waiters in the same waitqueue in 2.2.x to provide wake-one in accept). And in 2.2.x the "who does that anyways" is much stronger, since _only_ accept is using the exclusive wakeup. > I think it would be good to have the same semantics in 2.2 as > we have in 2.4. [..] 2.3.0 born for allowing O(1) inserction in the waitqueue because only that change generated and huge amount of details that was not possible to handle in 2.2.x. > [..] If there is a good reason to go with the > semantics Andrea proposed [..] NOTE: I'm only talking about 2.2.19pre2, not 2.4.x. I didn't suggested anything for 2.4.x and I'm totally fine with two different waitqueues. I even wanted to differentiate them too in mid 2.3.x to fix accept that was FIFO but still allowing insertion in the waitqueue in O(1), but didn't had the time to rework the TCP stack and the rest of wake-one users. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Wed, 20 Dec 2000, Andrea Arcangeli wrote: > On Thu, Dec 21, 2000 at 01:57:15AM +1100, Andrew Morton wrote: > > If a task is on two waitqueues at the same time it becomes a bug: > > if the outer waitqueue is non-exclusive and the inner is exclusive, > > Your 2.2.x won't allow that either. You set the > `current->task_exclusive = 1' and so you will get an exclusive > wakeups in both waitqueues. You simply cannot tell register in > two waitqueue and expect a non-exlusive wakeup in one and an > exclusive wakeup in the other one. Why not? Having a wake-all wakeup on one event and a wake-one wakeup on another kind of event seems perfectly reasonable to me at a first glance. Is there something I've overlooked ? > > Anyway, it's academic. davem would prefer that we do it properly > > and move the `exclusive' flag into the waitqueue head to avoid the > > task-on-two-waitqueues problem, as was done in 2.4. I think he's > > The fact you could mix non-exclusive and exlusive wakeups in the > same waitqueue was a feature not a misfeature. Then of course > you cannot register in two waitqueues one with wake-one and one > with wake-all but who does that anyways? The "who does that anyways" argument could also be said about mixing exclusive and non-exclusive wakeups in the same waitqueue. Doing this seems rather confusing ... would you know any application which needs this ? > I think the real reason for spearating the two things as davem > proposed is because otherwise we cannot register for a LIFO > wake-one in O(1) as we needed for accept. Do you have any reason to assume Davem and Linus lied about why they changed the semantics? ;) I'm pretty sure the reason why Linus and Davem changed the semantics is the same reason they mailed to this list ... ;) I think it would be good to have the same semantics in 2.2 as we have in 2.4. Using different semantics will probably only lead to confusion. If there is a good reason to go with the semantics Andrea proposed over the semantics Linus and Davem have in 2.4, we should probably either change 2.4 or use the 2.4 semantics in 2.2 anyway just to avoid the confusion... regards, Rik -- Hollywood goes for world dumbination, Trailer at 11. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Thu, Dec 21, 2000 at 01:57:15AM +1100, Andrew Morton wrote: > If a task is on two waitqueues at the same time it becomes a bug: > if the outer waitqueue is non-exclusive and the inner is exclusive, Your 2.2.x won't allow that either. You set the `current->task_exclusive = 1' and so you will get an exclusive wakeups in both waitqueues. You simply cannot tell register in two waitqueue and expect a non-exlusive wakeup in one and an exclusive wakeup in the other one. The only design difference (non implementation difference) between my patch and your patch is that you have to clear task_exlusive explicitly. You moved the EXCLUSIVE bitflag out of current->state field. That gives no advantages and it looks ugiler to me. The robusteness point doesn't hold IMHO: as soon as current->state is been changed by somebody else you don't care anymore if it was exclusive wakeup or not. About the fact I mask out the exlusive bit in schedule that's zero cost compared to a cacheline miss, but it also depends if you have more wakeups or schedules (with accept(2) usage there are going to be more schedule than wakeups, but on other usages that could not be the case) but ok, the performance point was nearly irrelevant ;). > Anyway, it's academic. davem would prefer that we do it properly > and move the `exclusive' flag into the waitqueue head to avoid the > task-on-two-waitqueues problem, as was done in 2.4. I think he's The fact you could mix non-exclusive and exlusive wakeups in the same waitqueue was a feature not a misfeature. Then of course you cannot register in two waitqueues one with wake-one and one with wake-all but who does that anyways? Definitely not an issue for 2.2.x. I think the real reason for spearating the two things as davem proposed is because otherwise we cannot register for a LIFO wake-one in O(1) as we needed for accept. Other thing about your patch, adding TASK_EXCLUSIVE to wake_up/wake_up_interruptible is useless. You kind of mixed the two things at the source level. In your patch TASK_EXCLUSIVE should not be defined. Last thing the wmb() in accept wasn't necessary. At that point you don't care at all what the wakeup can see. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
Andrea Arcangeli wrote: > > > o wake_one semantics for accept() (Andrew Morton) > > I dislike the implementation. I stick with my faster and nicer implementation > that was just included in aa kernels for some time (2.2.18aa2 included it for > example). Andrew, I guess you didn't noticed I just implemented the wakeone for > accept. (I just ported it on top of 2.2.19pre2 after backing out the wakeone in > pre2) Yes, I noticed your implementation a few weeks back. 'fraid I never liked the use of the TASK_EXCLUSIVE bit in task_struct.state. It's an enumerated state, not an aggregation of flags. Most of the kernel treats it as an enumerated state and so will happily clear the TASK_EXCLUSIVE bit without masking it off. Fragile. If a task is on two waitqueues at the same time it becomes a bug: if the outer waitqueue is non-exclusive and the inner is exclusive, the task suddenly becomes exclusive on the outer one and converts it from wake-all to wake-some! Is it faster? Not sure. You've saved a cacheline read in __wake_up (which was in fact a preload, if you look at what comes later) at the cost of having to mask out the bit in current->state every time we schedule(). Anyway, it's academic. davem would prefer that we do it properly and move the `exclusive' flag into the waitqueue head to avoid the task-on-two-waitqueues problem, as was done in 2.4. I think he's right. Do you? - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Sat, Dec 16, 2000 at 07:11:47PM +, Alan Cox wrote: > o E820 memory detect backport from 2.4(Michael Chen) It's broken, it will crash machines: for (i = 0; i < e820.nr_map; i++) { unsigned long start, end; /* RAM? */ if (e820.map[i].type != E820_RAM) continue; start = PFN_UP(e820.map[i].addr); end = PFN_DOWN(e820.map[i].addr + e820.map[i].size); if (start >= end) continue; if (end > max_pfn) max_pfn = end; } memory_end = (max_pfn << PAGE_SHIFT); this will threat non-RAM holes as RAM. On 2.4.x we do a different things, that is we collect the max_pfn but then we don't assume that there are no holes between 1M and max_pfn ;), we instead fill the bootmem allocator _only_ with E820_RAM segments. I was in the process of fixing this (I also just backported the thinkpad %edx clobber fix), but if somebody is going to work on this please let me know so we stay in sync. > o wake_one semantics for accept() (Andrew Morton) I dislike the implementation. I stick with my faster and nicer implementation that was just included in aa kernels for some time (2.2.18aa2 included it for example). Andrew, I guess you didn't noticed I just implemented the wakeone for accept. (I just ported it on top of 2.2.19pre2 after backing out the wakeone in pre2) Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Wed, Dec 20, 2000 at 12:32:54PM +0200, Petri Kaukasoina wrote: > OK, I booted 2.4.0-test12 which even prints that list: > > BIOS-provided physical RAM map: > BIOS-e820: 0009fc00 @ (usable) > BIOS-e820: 0400 @ 0009fc00 (reserved) > BIOS-e820: 0348 @ 0010 (usable) > BIOS-e820: 0010 @ fec0 (reserved) > BIOS-e820: 0010 @ fee0 (reserved) > BIOS-e820: 0001 @ (reserved) > Memory: 52232k/54784k available (831k kernel code, 2164k reserved, 62k data, 168k >init, 0k highmem) > > The last three reserved lines correspond to the missing 2.5 Megs. What are > they? Data reserved by your system for whatever purpose. Most probably ACPI data or similar. > 2.2.18 sees all 56 Megs and works ok and after adding mem=56M on the > kernel command line even 2.2.19pre2 works ok with all the 56 Megs. No > crashes. If you would have ACPI events, you would probably run into trouble. Apart from this, chances are that the reserved data is not needed by Linux and never accessed by the BIOS, so you may get away with using the reserved memory. The safe way is to respect the BIOS' RAM map. Regards, -- Kurt Garloff <[EMAIL PROTECTED]> Eindhoven, NL GPG key: See mail header, key servers Linux kernel development SuSE GmbH, Nuernberg, FRG SCSI, Security PGP signature
Re: Linux 2.2.19pre2
On Sun, Dec 17, 2000 at 04:38:02PM +0100, Kurt Garloff wrote: > On Sun, Dec 17, 2000 at 12:56:56PM +0200, Petri Kaukasoina wrote: > > I guess the new memory detect does not work correctly with my old work > > horse. It is a 100 MHz pentium with 56 Megs RAM. AMIBIOS dated 10/10/94 with > > a version number of 51-000-0001169_0011-101094-SIS550X-H. > > > > 2.2.18 reports: > > Memory: 55536k/57344k available (624k kernel code, 412k reserved, 732k data, 40k >init) > > > > 2.2.19pre2 reports: > > Memory: 53000k/54784k available (628k kernel code, 408k reserved, 708k data, 40k >init) > > > > 57344k is 56 Megs which is correct. > > 54784k is only 53.5 Megs. > > It's this patch that changes things for you: > o E820 memory detect backport from 2.4(Michael Chen) > > The E820 memory detection parses a list from the BIOS, which specifies > the amount of memory, holes, reserved regions, ... > Apparently, your BIOS does not do it completely correctly; otherwise you > should have had crashes before ... OK, I booted 2.4.0-test12 which even prints that list: BIOS-provided physical RAM map: BIOS-e820: 0009fc00 @ (usable) BIOS-e820: 0400 @ 0009fc00 (reserved) BIOS-e820: 0348 @ 0010 (usable) BIOS-e820: 0010 @ fec0 (reserved) BIOS-e820: 0010 @ fee0 (reserved) BIOS-e820: 0001 @ (reserved) Memory: 52232k/54784k available (831k kernel code, 2164k reserved, 62k data, 168k init, 0k highmem) The last three reserved lines correspond to the missing 2.5 Megs. What are they? 2.2.18 sees all 56 Megs and works ok and after adding mem=56M on the kernel command line even 2.2.19pre2 works ok with all the 56 Megs. No crashes. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Sun, Dec 17, 2000 at 04:38:02PM +0100, Kurt Garloff wrote: On Sun, Dec 17, 2000 at 12:56:56PM +0200, Petri Kaukasoina wrote: I guess the new memory detect does not work correctly with my old work horse. It is a 100 MHz pentium with 56 Megs RAM. AMIBIOS dated 10/10/94 with a version number of 51-000-0001169_0011-101094-SIS550X-H. 2.2.18 reports: Memory: 55536k/57344k available (624k kernel code, 412k reserved, 732k data, 40k init) 2.2.19pre2 reports: Memory: 53000k/54784k available (628k kernel code, 408k reserved, 708k data, 40k init) 57344k is 56 Megs which is correct. 54784k is only 53.5 Megs. It's this patch that changes things for you: o E820 memory detect backport from 2.4(Michael Chen) The E820 memory detection parses a list from the BIOS, which specifies the amount of memory, holes, reserved regions, ... Apparently, your BIOS does not do it completely correctly; otherwise you should have had crashes before ... OK, I booted 2.4.0-test12 which even prints that list: BIOS-provided physical RAM map: BIOS-e820: 0009fc00 @ (usable) BIOS-e820: 0400 @ 0009fc00 (reserved) BIOS-e820: 0348 @ 0010 (usable) BIOS-e820: 0010 @ fec0 (reserved) BIOS-e820: 0010 @ fee0 (reserved) BIOS-e820: 0001 @ (reserved) Memory: 52232k/54784k available (831k kernel code, 2164k reserved, 62k data, 168k init, 0k highmem) The last three reserved lines correspond to the missing 2.5 Megs. What are they? 2.2.18 sees all 56 Megs and works ok and after adding mem=56M on the kernel command line even 2.2.19pre2 works ok with all the 56 Megs. No crashes. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Wed, Dec 20, 2000 at 12:32:54PM +0200, Petri Kaukasoina wrote: OK, I booted 2.4.0-test12 which even prints that list: BIOS-provided physical RAM map: BIOS-e820: 0009fc00 @ (usable) BIOS-e820: 0400 @ 0009fc00 (reserved) BIOS-e820: 0348 @ 0010 (usable) BIOS-e820: 0010 @ fec0 (reserved) BIOS-e820: 0010 @ fee0 (reserved) BIOS-e820: 0001 @ (reserved) Memory: 52232k/54784k available (831k kernel code, 2164k reserved, 62k data, 168k init, 0k highmem) The last three reserved lines correspond to the missing 2.5 Megs. What are they? Data reserved by your system for whatever purpose. Most probably ACPI data or similar. 2.2.18 sees all 56 Megs and works ok and after adding mem=56M on the kernel command line even 2.2.19pre2 works ok with all the 56 Megs. No crashes. If you would have ACPI events, you would probably run into trouble. Apart from this, chances are that the reserved data is not needed by Linux and never accessed by the BIOS, so you may get away with using the reserved memory. The safe way is to respect the BIOS' RAM map. Regards, -- Kurt Garloff [EMAIL PROTECTED] Eindhoven, NL GPG key: See mail header, key servers Linux kernel development SuSE GmbH, Nuernberg, FRG SCSI, Security PGP signature
Re: Linux 2.2.19pre2
On Sat, Dec 16, 2000 at 07:11:47PM +, Alan Cox wrote: o E820 memory detect backport from 2.4(Michael Chen) It's broken, it will crash machines: for (i = 0; i e820.nr_map; i++) { unsigned long start, end; /* RAM? */ if (e820.map[i].type != E820_RAM) continue; start = PFN_UP(e820.map[i].addr); end = PFN_DOWN(e820.map[i].addr + e820.map[i].size); if (start = end) continue; if (end max_pfn) max_pfn = end; } memory_end = (max_pfn PAGE_SHIFT); this will threat non-RAM holes as RAM. On 2.4.x we do a different things, that is we collect the max_pfn but then we don't assume that there are no holes between 1M and max_pfn ;), we instead fill the bootmem allocator _only_ with E820_RAM segments. I was in the process of fixing this (I also just backported the thinkpad %edx clobber fix), but if somebody is going to work on this please let me know so we stay in sync. o wake_one semantics for accept() (Andrew Morton) I dislike the implementation. I stick with my faster and nicer implementation that was just included in aa kernels for some time (2.2.18aa2 included it for example). Andrew, I guess you didn't noticed I just implemented the wakeone for accept. (I just ported it on top of 2.2.19pre2 after backing out the wakeone in pre2) Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
Andrea Arcangeli wrote: o wake_one semantics for accept() (Andrew Morton) I dislike the implementation. I stick with my faster and nicer implementation that was just included in aa kernels for some time (2.2.18aa2 included it for example). Andrew, I guess you didn't noticed I just implemented the wakeone for accept. (I just ported it on top of 2.2.19pre2 after backing out the wakeone in pre2) Yes, I noticed your implementation a few weeks back. 'fraid I never liked the use of the TASK_EXCLUSIVE bit in task_struct.state. It's an enumerated state, not an aggregation of flags. Most of the kernel treats it as an enumerated state and so will happily clear the TASK_EXCLUSIVE bit without masking it off. Fragile. If a task is on two waitqueues at the same time it becomes a bug: if the outer waitqueue is non-exclusive and the inner is exclusive, the task suddenly becomes exclusive on the outer one and converts it from wake-all to wake-some! Is it faster? Not sure. You've saved a cacheline read in __wake_up (which was in fact a preload, if you look at what comes later) at the cost of having to mask out the bit in current-state every time we schedule(). Anyway, it's academic. davem would prefer that we do it properly and move the `exclusive' flag into the waitqueue head to avoid the task-on-two-waitqueues problem, as was done in 2.4. I think he's right. Do you? - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Thu, Dec 21, 2000 at 01:57:15AM +1100, Andrew Morton wrote: If a task is on two waitqueues at the same time it becomes a bug: if the outer waitqueue is non-exclusive and the inner is exclusive, Your 2.2.x won't allow that either. You set the `current-task_exclusive = 1' and so you will get an exclusive wakeups in both waitqueues. You simply cannot tell register in two waitqueue and expect a non-exlusive wakeup in one and an exclusive wakeup in the other one. The only design difference (non implementation difference) between my patch and your patch is that you have to clear task_exlusive explicitly. You moved the EXCLUSIVE bitflag out of current-state field. That gives no advantages and it looks ugiler to me. The robusteness point doesn't hold IMHO: as soon as current-state is been changed by somebody else you don't care anymore if it was exclusive wakeup or not. About the fact I mask out the exlusive bit in schedule that's zero cost compared to a cacheline miss, but it also depends if you have more wakeups or schedules (with accept(2) usage there are going to be more schedule than wakeups, but on other usages that could not be the case) but ok, the performance point was nearly irrelevant ;). Anyway, it's academic. davem would prefer that we do it properly and move the `exclusive' flag into the waitqueue head to avoid the task-on-two-waitqueues problem, as was done in 2.4. I think he's The fact you could mix non-exclusive and exlusive wakeups in the same waitqueue was a feature not a misfeature. Then of course you cannot register in two waitqueues one with wake-one and one with wake-all but who does that anyways? Definitely not an issue for 2.2.x. I think the real reason for spearating the two things as davem proposed is because otherwise we cannot register for a LIFO wake-one in O(1) as we needed for accept. Other thing about your patch, adding TASK_EXCLUSIVE to wake_up/wake_up_interruptible is useless. You kind of mixed the two things at the source level. In your patch TASK_EXCLUSIVE should not be defined. Last thing the wmb() in accept wasn't necessary. At that point you don't care at all what the wakeup can see. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Wed, 20 Dec 2000, Andrea Arcangeli wrote: On Thu, Dec 21, 2000 at 01:57:15AM +1100, Andrew Morton wrote: If a task is on two waitqueues at the same time it becomes a bug: if the outer waitqueue is non-exclusive and the inner is exclusive, Your 2.2.x won't allow that either. You set the `current-task_exclusive = 1' and so you will get an exclusive wakeups in both waitqueues. You simply cannot tell register in two waitqueue and expect a non-exlusive wakeup in one and an exclusive wakeup in the other one. Why not? Having a wake-all wakeup on one event and a wake-one wakeup on another kind of event seems perfectly reasonable to me at a first glance. Is there something I've overlooked ? Anyway, it's academic. davem would prefer that we do it properly and move the `exclusive' flag into the waitqueue head to avoid the task-on-two-waitqueues problem, as was done in 2.4. I think he's The fact you could mix non-exclusive and exlusive wakeups in the same waitqueue was a feature not a misfeature. Then of course you cannot register in two waitqueues one with wake-one and one with wake-all but who does that anyways? The "who does that anyways" argument could also be said about mixing exclusive and non-exclusive wakeups in the same waitqueue. Doing this seems rather confusing ... would you know any application which needs this ? I think the real reason for spearating the two things as davem proposed is because otherwise we cannot register for a LIFO wake-one in O(1) as we needed for accept. Do you have any reason to assume Davem and Linus lied about why they changed the semantics? ;) I'm pretty sure the reason why Linus and Davem changed the semantics is the same reason they mailed to this list ... ;) I think it would be good to have the same semantics in 2.2 as we have in 2.4. Using different semantics will probably only lead to confusion. If there is a good reason to go with the semantics Andrea proposed over the semantics Linus and Davem have in 2.4, we should probably either change 2.4 or use the 2.4 semantics in 2.2 anyway just to avoid the confusion... regards, Rik -- Hollywood goes for world dumbination, Trailer at 11. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
On Wed, Dec 20, 2000 at 03:48:06PM -0200, Rik van Riel wrote: On Wed, 20 Dec 2000, Andrea Arcangeli wrote: On Thu, Dec 21, 2000 at 01:57:15AM +1100, Andrew Morton wrote: If a task is on two waitqueues at the same time it becomes a bug: if the outer waitqueue is non-exclusive and the inner is exclusive, Your 2.2.x won't allow that either. You set the `current-task_exclusive = 1' and so you will get an exclusive wakeups in both waitqueues. You simply cannot tell register in two waitqueue and expect a non-exlusive wakeup in one and an exclusive wakeup in the other one. Why not? Having a wake-all wakeup on one event and a wake-one wakeup on another kind of event seems perfectly reasonable to me at a first glance. Is there something I've overlooked ? I think you overlooked, never mind. I only said kernel 2.2.19pre2 won't allow that either. I'm not saying it's impossible to implement or unrasonable. Anyway, it's academic. davem would prefer that we do it properly and move the `exclusive' flag into the waitqueue head to avoid the task-on-two-waitqueues problem, as was done in 2.4. I think he's The fact you could mix non-exclusive and exlusive wakeups in the same waitqueue was a feature not a misfeature. Then of course you cannot register in two waitqueues one with wake-one and one with wake-all but who does that anyways? The "who does that anyways" argument could also be said about mixing exclusive and non-exclusive wakeups in the same waitqueue. Doing this seems rather confusing ... would you know any application which needs this ? wake-one accept(2) in 2.2.x unless you want to rewrite part of the TCP stack to still get select wake-all right (the reason they was mixed in whole 2.3.x was the same reason we _need_ to somehow mix non-excusive and exlusive waiters in the same waitqueue in 2.2.x to provide wake-one in accept). And in 2.2.x the "who does that anyways" is much stronger, since _only_ accept is using the exclusive wakeup. I think it would be good to have the same semantics in 2.2 as we have in 2.4. [..] 2.3.0 born for allowing O(1) inserction in the waitqueue because only that change generated and huge amount of details that was not possible to handle in 2.2.x. [..] If there is a good reason to go with the semantics Andrea proposed [..] NOTE: I'm only talking about 2.2.19pre2, not 2.4.x. I didn't suggested anything for 2.4.x and I'm totally fine with two different waitqueues. I even wanted to differentiate them too in mid 2.3.x to fix accept that was FIFO but still allowing insertion in the waitqueue in O(1), but didn't had the time to rework the TCP stack and the rest of wake-one users. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
Hi, On Sun, Dec 17, 2000 at 12:56:56PM +0200, Petri Kaukasoina wrote: > I guess the new memory detect does not work correctly with my old work > horse. It is a 100 MHz pentium with 56 Megs RAM. AMIBIOS dated 10/10/94 with > a version number of 51-000-0001169_0011-101094-SIS550X-H. > > 2.2.18 reports: > Memory: 55536k/57344k available (624k kernel code, 412k reserved, 732k data, 40k >init) > > 2.2.19pre2 reports: > Memory: 53000k/54784k available (628k kernel code, 408k reserved, 708k data, 40k >init) > > 57344k is 56 Megs which is correct. > 54784k is only 53.5 Megs. It's this patch that changes things for you: o E820 memory detect backport from 2.4(Michael Chen) The E820 memory detection parses a list from the BIOS, which specifies the amount of memory, holes, reserved regions, ... Apparently, your BIOS does not do it completely correctly; otherwise you should have had crashes before ... Regards, -- Kurt Garloff <[EMAIL PROTECTED]> Eindhoven, NL GPG key: See mail header, key servers Linux kernel development SuSE GmbH, Nuernberg, FRG SCSI, Security PGP signature
Re: Linux 2.2.19pre2
I guess the new memory detect does not work correctly with my old work horse. It is a 100 MHz pentium with 56 Megs RAM. AMIBIOS dated 10/10/94 with a version number of 51-000-0001169_0011-101094-SIS550X-H. 2.2.18 reports: Memory: 55536k/57344k available (624k kernel code, 412k reserved, 732k data, 40k init) 2.2.19pre2 reports: Memory: 53000k/54784k available (628k kernel code, 408k reserved, 708k data, 40k init) 57344k is 56 Megs which is correct. 54784k is only 53.5 Megs. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
I guess the new memory detect does not work correctly with my old work horse. It is a 100 MHz pentium with 56 Megs RAM. AMIBIOS dated 10/10/94 with a version number of 51-000-0001169_0011-101094-SIS550X-H. 2.2.18 reports: Memory: 55536k/57344k available (624k kernel code, 412k reserved, 732k data, 40k init) 2.2.19pre2 reports: Memory: 53000k/54784k available (628k kernel code, 408k reserved, 708k data, 40k init) 57344k is 56 Megs which is correct. 54784k is only 53.5 Megs. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.2.19pre2
Hi, On Sun, Dec 17, 2000 at 12:56:56PM +0200, Petri Kaukasoina wrote: I guess the new memory detect does not work correctly with my old work horse. It is a 100 MHz pentium with 56 Megs RAM. AMIBIOS dated 10/10/94 with a version number of 51-000-0001169_0011-101094-SIS550X-H. 2.2.18 reports: Memory: 55536k/57344k available (624k kernel code, 412k reserved, 732k data, 40k init) 2.2.19pre2 reports: Memory: 53000k/54784k available (628k kernel code, 408k reserved, 708k data, 40k init) 57344k is 56 Megs which is correct. 54784k is only 53.5 Megs. It's this patch that changes things for you: o E820 memory detect backport from 2.4(Michael Chen) The E820 memory detection parses a list from the BIOS, which specifies the amount of memory, holes, reserved regions, ... Apparently, your BIOS does not do it completely correctly; otherwise you should have had crashes before ... Regards, -- Kurt Garloff [EMAIL PROTECTED] Eindhoven, NL GPG key: See mail header, key servers Linux kernel development SuSE GmbH, Nuernberg, FRG SCSI, Security PGP signature