On Fri, Feb 06, 2015 at 08:02:41AM -0800, Linus Torvalds wrote: > On Fri, Feb 6, 2015 at 3:50 AM, Peter Zijlstra <pet...@infradead.org> wrote: > > > > Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody > > cares about that barrier, so make it go away. > > I'd rather not mix this with the patch, and wonder if we should just > do that globally with some preprocessor magic. We do have a fair > number of "set_current_state(TASK_RUNNING)"
138 > and at least for the > *documented* reason for the memory barrier, all of them could/should > be barrier-less. > > So something like > > if (__is_constant_p(state) && state == TASK_RUNNING) > tsk->state = state; > else > set_mb(tsk->state, state); > > might be more general solution than randomly doing one at a time when > changing code around it.. Yeah, or we could do the coccinelle thing and do a mass conversion. I like the macro one though; I worry a wee bit about non-documented cases through. If someone is doing something way subtle we'll break it :/ --- Subject: sched: Avoid the full memory-barrier for set_current_state(TASK_RUNNING) One should never need the full memory barrier implied by set_current_state() to set TASK_RUNNING for the documented reason of avoiding races against wakeup. Suggested-by: Linus Torvalds <torva...@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> --- diff --git a/include/linux/sched.h b/include/linux/sched.h index 8db31ef98d2f..aea44c4eeed8 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -243,6 +243,27 @@ extern char ___assert_task_state[1 - 2*!!( ((task->state & TASK_UNINTERRUPTIBLE) != 0 && \ (task->flags & PF_FROZEN) == 0) +/* + * set_current_state() includes a barrier so that the write of current->state + * is correctly serialised wrt the caller's subsequent test of whether to + * actually sleep: + * + * set_current_state(TASK_UNINTERRUPTIBLE); + * if (do_i_need_to_sleep()) + * schedule(); + * + * If the caller does not need such serialisation then use + * __set_current_state(). This is always true for TASK_RUNNING since + * there is no race against wakeup -- both write the same value. + */ +#define ___set_current_state(state) \ +do { \ + if (__is_constant_p(state) && (state) == TASK_RUNNING) \ + current->state = (state); \ + else \ + set_mb(current->state, (state)); \ +} while (0) + #ifdef CONFIG_DEBUG_ATOMIC_SLEEP #define __set_task_state(tsk, state_value) \ @@ -256,17 +277,6 @@ extern char ___assert_task_state[1 - 2*!!( set_mb((tsk)->state, (state_value)); \ } while (0) -/* - * set_current_state() includes a barrier so that the write of current->state - * is correctly serialised wrt the caller's subsequent test of whether to - * actually sleep: - * - * set_current_state(TASK_UNINTERRUPTIBLE); - * if (do_i_need_to_sleep()) - * schedule(); - * - * If the caller does not need such serialisation then use __set_current_state() - */ #define __set_current_state(state_value) \ do { \ current->task_state_change = _THIS_IP_; \ @@ -275,7 +285,7 @@ extern char ___assert_task_state[1 - 2*!!( #define set_current_state(state_value) \ do { \ current->task_state_change = _THIS_IP_; \ - set_mb(current->state, (state_value)); \ + ___set_current_state(state_value); \ } while (0) #else @@ -285,21 +295,10 @@ extern char ___assert_task_state[1 - 2*!!( #define set_task_state(tsk, state_value) \ set_mb((tsk)->state, (state_value)) -/* - * set_current_state() includes a barrier so that the write of current->state - * is correctly serialised wrt the caller's subsequent test of whether to - * actually sleep: - * - * set_current_state(TASK_UNINTERRUPTIBLE); - * if (do_i_need_to_sleep()) - * schedule(); - * - * If the caller does not need such serialisation then use __set_current_state() - */ #define __set_current_state(state_value) \ do { current->state = (state_value); } while (0) #define set_current_state(state_value) \ - set_mb(current->state, (state_value)) + ___set_current_state(state_value); #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/