On 08/20, Peter Zijlstra wrote: > > On Tue, Aug 20, 2013 at 06:33:12PM +0200, Oleg Nesterov wrote: > > --- x/kernel/sched/core.c > > +++ x/kernel/sched/core.c > > @@ -2435,6 +2435,9 @@ need_resched: > > rq->curr = next; > > ++*switch_count; > > > > + if (unlikely(prev->in_iowait)) > > + rq->nr_iowait++; > > + > > context_switch(rq, prev, next); /* unlocks the rq */ > > /* > > * The context switch have flipped the stack from under us > > @@ -2442,6 +2445,12 @@ need_resched: > > * this task called schedule() in the past. prev == current > > * is still correct, but it can be moved to another cpu/rq. > > */ > > + if (unlikely(prev->in_iowait)) { > > + raw_spin_lock_irq(&rq->lock); > > + rq->nr_iowait--; > > + raw_spin_unlock_irq(&rq->lock); > > + } > > This seems like the wrong place, this is where you return from > schedule() running another task,
Yes, but prev is current, and rq should be "correct" for rq->nr_iowait-- ? This local var should be equal to its value when this task called context_switch() in the past. Like any other variable, like "rq = raw_rq()" in io_schedule(). > not where the task you just send to > sleep wakes up. sure, but currently io_schedule() does the same. Btw. Whatever we do, can't we unify io_schedule/io_schedule_timeout? Oleg. --- x/kernel/sched/core.c +++ x/kernel/sched/core.c @@ -3939,16 +3939,7 @@ EXPORT_SYMBOL_GPL(yield_to); */ void __sched io_schedule(void) { - struct rq *rq = raw_rq(); - - delayacct_blkio_start(); - atomic_inc(&rq->nr_iowait); - blk_flush_plug(current); - current->in_iowait = 1; - schedule(); - current->in_iowait = 0; - atomic_dec(&rq->nr_iowait); - delayacct_blkio_end(); + io_schedule_timeout(MAX_SCHEDULE_TIMEOUT); } EXPORT_SYMBOL(io_schedule); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/