Seems that here and there, osv_execve returns with thread_id still at 0,
instead of containing new thread id.
So a little race.
Then, get_all_app_threads either returns only ID of thread which called
osv_execve, or thread IDs of both thread which called osv_execve and
thread created by osv_execve.
I will somehow send te test code.
BR Justin
On 08/22/2016 04:54 PM, Justin Cinkelj wrote:
I tested it ,without my patches. It didn't work. CPU affinity is set
to wrong threads - to threads belonging to orted.so, not to mpi_app.so.
Maybe with_all_app_threads gets confused, as it seems to return thread
ID of threads belonging to orted.so, not to mpi_app.so.
I will try to modify tst-execve/tst-execve-payload to show that,
without using open mpi.
BR Justin
On 08/22/2016 02:09 PM, Nadav Har'El wrote:
On Mon, Aug 22, 2016 at 12:31 PM, Justin Cinkelj
<justin.cink...@xlab.si> wrote:
On 08/21/2016 04:38 PM, Nadav Har'El wrote:
I've been thinking about this some more, and why you had to
change your osv_execve() implementation - which already returned
a thread id, but the "wrong" one:
The "original sin", so to speak, is that both osv_execve() and
application::start() which it calls (through osv::run()), start
a new thread. So if we have thread A and run osv_execve(), it
creates thread B, and in it creates thread C to run the actual
application code. osv_execve() returned B, but you wanted it to
create C.
But why must we have both B and C? Wouldn't it be better if we
had just one?
osv::run() currently does:
auto app = osv::application::run(path, args, new_program, env);
app->join();
Yes, that's the main and only reason. The wrong tid was good
enough to later detect when newly started app terminated
(osv_waittid() from core/osv_execve.cc). But now I need to
setaffinity of threads in the new app. And the wrong thread is -
the wrong one...
What if instead it called a new int
osv::application::run_and_join(...)?
This run_and_join() will call a new start_and_join(), which
unlike start() will not create a new thread to run the code -
but rather run it in the thread it was called in. The
start_and_join() will need to also set the current_app field on
the existing thread correctly (and revert it when the main
function returns).
If we do this, osv_execve's returned pid will be the correct
one, and won't need to change.
Does this idea make sense to you?
Sounds good to me. I guess best I just drop the current patch,
and wait on the new one.
Hi Justin, I just sent a "RFC" patch to the OSv mailing list along
the above lines. The end result is that it changes osv::run, which
osv_execve() uses, to not create a new thread, so the thread id
returned by your original osv_execve will be correct - without the
changes you suggested in your patch.
I sent the patch as an RFC because I didn't test it enough (though
all the tests of "make check" pass).
Could you please check if this RFC patch fixes osv_execve to do what
you wanted it to do, without your "main_thread" patches?
Thanks,
Nadav.
--
You received this message because you are subscribed to the Google Groups "OSv
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.