Re: [PATCH 1/1] poweroff: change orderly_poweroff() to use schedule_work()

Oleg Nesterov Fri, 15 Mar 2013 09:41:50 -0700

On 03/14, Andrew Morton wrote:
>
> On Wed, 13 Mar 2013 18:47:05 +0100 Oleg Nesterov <[email protected]> wrote:
>
> > This means that orderly_poweroff() becomes async even if we do not
> > run the command and always succeeds, schedule_work() can only fail
> > if the work is already pending. We can export __orderly_poweroff()
> > and change the non-atomic callers which want the old semantics.
> >
> > ...
> >
> > @@ -2218,21 +2237,9 @@ static int __orderly_poweroff(void)
> >   */
> >  int orderly_poweroff(bool force)
> >  {
> > +   if (force) /* do not override the pending "true" */
> > +           poweroff_force = true;
> > +   schedule_work(&poweroff_work);
> > +   return 0;
> >  }
>
> afaict the current version of orderly_poweroff() will never return -
> either __orderly_poweroff() will block until the machine shuts down or
> kernel_power_off() will do so.


Note that __orderly_poweroff() uses UMH_WAIT_EXEC, not UMH_WAIT_PROC,
so it returns right after /sbin/poweroff starts to execute.

So it is already asynchronous unless execve() fails.

> However with this patch there is a path via which orderly_poweroff()
> can return to its caller, I think?

See above, but please also read the changelog.

With this patch orderly_poweroff() is always async, even if exec fails,
but

> If so, the caller might be rather
> surprised and we're exercising never-before-used code paths.  In fact
> if the surprised caller goes oops, the poweroff might not occur at all.

This should not happen.

Anyway. Please also note that now we can export __orderly_poweroff() and
probably change it, it can have another argument "bool use_UMH_WAIT_PROC".

        int __orderly_poweroff(bool force, bool sync)
        {
                int wait = sync ? UMH_WAIT_EXEC : UMH_WAIT_EXEC;

                ret = call_usermodehelper(argv[0], argv, envp, wait);

                if (force) {
                        // EXEC failed or /sbin/poweroff didn't do its work
                        if (ret || sync)
                                kernel_power_off();
                }
        }

The non-atomic callers can use __orderly_poweroff(sync => true).

---------------------------------------------------------------------------
And, Andrew, et all... Could you help with another mentioned problem? It is
really trivial, but exactly because it is trivial I do not know what should
I do.

To remind, say, argv_split(poweroff_cmd) can race with sysctl changing this
string, in this case it can write to the memory after argv[] array. We can
fix this, or we can rewrite argv_split/free:

        void argv_free(char **argv)
        {
                kfree(argv[-1]);
                kfree(argv);
        }

        char **argv_split(gfp_t gfp, const char *str, int *argcp)
        {
                char *argv_str;
                bool was_space;
                char **argv, **argv_ret;
                int argc;

                argv_str = kstrndup(str, KMALLOC_MAX_SIZE, gfp);
                if (!argv_str)
                        return NULL;

                argc = count_argc(argv_str);
                argv = kmalloc(sizeof(*argv) * (argc + 2), gfp);
                if (!argv) {
                        kfree(argv_str);
                        return NULL;
                }

                *argv = argv_str;
                argv_ret = ++argv;
                for (was_space = true; *argv_str; argv_str++) {
                        if (isspace(*argv_str)) {
                                was_space = true;
                                *argv_str = 0;
                        } else if (was_space) {
                                was_space = false;
                                *argv++ = argv_str;
                        }
                }
                *argv = NULL;

                if (argcp)
                        *argcp = argc;
                return argv_ret;
        }

This way it uses a single kstrndup() to keep the arguments and it is
always safe.

But, whatever we do with argv_split(), it can hit the string "in between".
Personally I think we do not really care, but...

Perhaps we should add proc_dostring_lock() which takes some lock and
modify the callers of argv_split() (or add argv_split_lock) ?

Or perhaps we should introduce the rwsem which should protect every
sysctl-string and proc_dostring() should take this lock?

Help! I'd prefer to rewrite argv_split(), but I agree with any suggestion
in advance.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] poweroff: change orderly_poweroff() to use schedule_work()

Reply via email to