On 03/14, Andrew Morton wrote: > > On Wed, 13 Mar 2013 18:47:05 +0100 Oleg Nesterov <o...@redhat.com> wrote: > > > This means that orderly_poweroff() becomes async even if we do not > > run the command and always succeeds, schedule_work() can only fail > > if the work is already pending. We can export __orderly_poweroff() > > and change the non-atomic callers which want the old semantics. > > > > ... > > > > @@ -2218,21 +2237,9 @@ static int __orderly_poweroff(void) > > */ > > int orderly_poweroff(bool force) > > { > > + if (force) /* do not override the pending "true" */ > > + poweroff_force = true; > > + schedule_work(&poweroff_work); > > + return 0; > > } > > afaict the current version of orderly_poweroff() will never return - > either __orderly_poweroff() will block until the machine shuts down or > kernel_power_off() will do so.
Note that __orderly_poweroff() uses UMH_WAIT_EXEC, not UMH_WAIT_PROC, so it returns right after /sbin/poweroff starts to execute. So it is already asynchronous unless execve() fails. > However with this patch there is a path via which orderly_poweroff() > can return to its caller, I think? See above, but please also read the changelog. With this patch orderly_poweroff() is always async, even if exec fails, but > If so, the caller might be rather > surprised and we're exercising never-before-used code paths. In fact > if the surprised caller goes oops, the poweroff might not occur at all. This should not happen. Anyway. Please also note that now we can export __orderly_poweroff() and probably change it, it can have another argument "bool use_UMH_WAIT_PROC". int __orderly_poweroff(bool force, bool sync) { int wait = sync ? UMH_WAIT_EXEC : UMH_WAIT_EXEC; ret = call_usermodehelper(argv[0], argv, envp, wait); if (force) { // EXEC failed or /sbin/poweroff didn't do its work if (ret || sync) kernel_power_off(); } } The non-atomic callers can use __orderly_poweroff(sync => true). --------------------------------------------------------------------------- And, Andrew, et all... Could you help with another mentioned problem? It is really trivial, but exactly because it is trivial I do not know what should I do. To remind, say, argv_split(poweroff_cmd) can race with sysctl changing this string, in this case it can write to the memory after argv[] array. We can fix this, or we can rewrite argv_split/free: void argv_free(char **argv) { kfree(argv[-1]); kfree(argv); } char **argv_split(gfp_t gfp, const char *str, int *argcp) { char *argv_str; bool was_space; char **argv, **argv_ret; int argc; argv_str = kstrndup(str, KMALLOC_MAX_SIZE, gfp); if (!argv_str) return NULL; argc = count_argc(argv_str); argv = kmalloc(sizeof(*argv) * (argc + 2), gfp); if (!argv) { kfree(argv_str); return NULL; } *argv = argv_str; argv_ret = ++argv; for (was_space = true; *argv_str; argv_str++) { if (isspace(*argv_str)) { was_space = true; *argv_str = 0; } else if (was_space) { was_space = false; *argv++ = argv_str; } } *argv = NULL; if (argcp) *argcp = argc; return argv_ret; } This way it uses a single kstrndup() to keep the arguments and it is always safe. But, whatever we do with argv_split(), it can hit the string "in between". Personally I think we do not really care, but... Perhaps we should add proc_dostring_lock() which takes some lock and modify the callers of argv_split() (or add argv_split_lock) ? Or perhaps we should introduce the rwsem which should protect every sysctl-string and proc_dostring() should take this lock? Help! I'd prefer to rewrite argv_split(), but I agree with any suggestion in advance. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/