Seems that here and there, osv_execve returns with thread_id still at 0, instead of containing new thread id.
So a little race.
Then, get_all_app_threads either returns only ID of thread which called osv_execve, or thread IDs of both thread which called osv_execve and thread created by osv_execve.

I will somehow send te test code.

BR Justin

On 08/22/2016 04:54 PM, Justin Cinkelj wrote:
I tested it ,without my patches. It didn't work. CPU affinity is set to wrong threads - to threads belonging to orted.so, not to mpi_app.so. Maybe with_all_app_threads gets confused, as it seems to return thread ID of threads belonging to orted.so, not to mpi_app.so. I will try to modify tst-execve/tst-execve-payload to show that, without using open mpi.

BR Justin

On 08/22/2016 02:09 PM, Nadav Har'El wrote:

On Mon, Aug 22, 2016 at 12:31 PM, Justin Cinkelj <justin.cink...@xlab.si> wrote:



    On 08/21/2016 04:38 PM, Nadav Har'El wrote:


    I've been thinking about this some more, and why you had to
    change your osv_execve() implementation - which already returned
    a thread id, but the "wrong" one:

    The "original sin", so to speak, is that both osv_execve() and
    application::start() which it calls (through osv::run()), start
    a new thread. So if we have thread A and run osv_execve(), it
    creates thread B, and in it creates thread C to run the actual
    application code. osv_execve() returned B, but you wanted it to
    create C.
    But why must we have both B and C? Wouldn't it be better if we
    had just one?

    osv::run() currently does:

        auto app = osv::application::run(path, args, new_program, env);
        app->join();
    Yes, that's the main and only reason. The wrong tid was good
    enough to later detect when newly started app terminated
    (osv_waittid() from core/osv_execve.cc). But now I need to
    setaffinity of threads in the new app. And the wrong thread is -
    the wrong one...




    What if instead it called a new int
    osv::application::run_and_join(...)?
    This run_and_join()  will call a new start_and_join(), which
    unlike start() will not create a new thread to run the code -
    but rather run it in the thread it was called in. The
    start_and_join() will need to also set the current_app field on
    the existing thread correctly (and revert it when the main
    function returns).

    If we do this, osv_execve's returned pid will be the correct
    one, and won't need to change.

    Does this idea make sense to you?
    Sounds good to me. I guess best I just drop the current patch,
    and wait on the new one.


Hi Justin, I just sent a "RFC" patch to the OSv mailing list along the above lines. The end result is that it changes osv::run, which osv_execve() uses, to not create a new thread, so the thread id returned by your original osv_execve will be correct - without the changes you suggested in your patch.

I sent the patch as an RFC because I didn't test it enough (though all the tests of "make check" pass). Could you please check if this RFC patch fixes osv_execve to do what you wanted it to do, without your "main_thread" patches?

Thanks,
Nadav.



--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to