On Fri, Feb 06, 2015 at 08:02:41AM -0800, Linus Torvalds wrote:
> On Fri, Feb 6, 2015 at 3:50 AM, Peter Zijlstra <pet...@infradead.org> wrote:
> >
> > Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody
> > cares about that barrier, so make it go away.
> 
> I'd rather not mix this with the patch, and wonder if we should just
> do that globally with some preprocessor magic. We do have a fair
> number of "set_current_state(TASK_RUNNING)" 

138

> and at least for the
> *documented* reason for the memory barrier, all of them could/should
> be barrier-less.
> 
> So something like
> 
>     if (__is_constant_p(state) && state == TASK_RUNNING)
>         tsk->state = state;
>     else
>         set_mb(tsk->state, state);
> 
> might be more general solution than randomly doing one at a time when
> changing code around it..

Yeah, or we could do the coccinelle thing and do a mass conversion.

I like the macro one though; I worry a wee bit about non-documented
cases through. If someone is doing something way subtle we'll break it
:/


---
Subject: sched: Avoid the full memory-barrier for 
set_current_state(TASK_RUNNING)

One should never need the full memory barrier implied by
set_current_state() to set TASK_RUNNING for the documented reason of
avoiding races against wakeup.

Suggested-by: Linus Torvalds <torva...@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
---
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8db31ef98d2f..aea44c4eeed8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -243,6 +243,27 @@ extern char ___assert_task_state[1 - 2*!!(
                                ((task->state & TASK_UNINTERRUPTIBLE) != 0 && \
                                 (task->flags & PF_FROZEN) == 0)
 
+/*
+ * set_current_state() includes a barrier so that the write of current->state
+ * is correctly serialised wrt the caller's subsequent test of whether to
+ * actually sleep:
+ *
+ *     set_current_state(TASK_UNINTERRUPTIBLE);
+ *     if (do_i_need_to_sleep())
+ *             schedule();
+ *
+ * If the caller does not need such serialisation then use
+ * __set_current_state(). This is always true for TASK_RUNNING since
+ * there is no race against wakeup -- both write the same value.
+ */
+#define ___set_current_state(state)                            \
+do {                                                           \
+       if (__is_constant_p(state) && (state) == TASK_RUNNING)  \
+               current->state = (state);                       \
+       else                                                    \
+               set_mb(current->state, (state));                \
+} while (0)
+
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 
 #define __set_task_state(tsk, state_value)                     \
@@ -256,17 +277,6 @@ extern char ___assert_task_state[1 - 2*!!(
                set_mb((tsk)->state, (state_value));            \
        } while (0)
 
-/*
- * set_current_state() includes a barrier so that the write of current->state
- * is correctly serialised wrt the caller's subsequent test of whether to
- * actually sleep:
- *
- *     set_current_state(TASK_UNINTERRUPTIBLE);
- *     if (do_i_need_to_sleep())
- *             schedule();
- *
- * If the caller does not need such serialisation then use 
__set_current_state()
- */
 #define __set_current_state(state_value)                       \
        do {                                                    \
                current->task_state_change = _THIS_IP_;         \
@@ -275,7 +285,7 @@ extern char ___assert_task_state[1 - 2*!!(
 #define set_current_state(state_value)                         \
        do {                                                    \
                current->task_state_change = _THIS_IP_;         \
-               set_mb(current->state, (state_value));          \
+               ___set_current_state(state_value);              \
        } while (0)
 
 #else
@@ -285,21 +295,10 @@ extern char ___assert_task_state[1 - 2*!!(
 #define set_task_state(tsk, state_value)               \
        set_mb((tsk)->state, (state_value))
 
-/*
- * set_current_state() includes a barrier so that the write of current->state
- * is correctly serialised wrt the caller's subsequent test of whether to
- * actually sleep:
- *
- *     set_current_state(TASK_UNINTERRUPTIBLE);
- *     if (do_i_need_to_sleep())
- *             schedule();
- *
- * If the caller does not need such serialisation then use 
__set_current_state()
- */
 #define __set_current_state(state_value)               \
        do { current->state = (state_value); } while (0)
 #define set_current_state(state_value)                 \
-       set_mb(current->state, (state_value))
+       ___set_current_state(state_value);
 
 #endif
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to