No, there is no stubbed warning - about pthread_key_delete or any other function.

BR Justin

On 08/28/2016 02:23 PM, Nadav Har'El wrote:
By the way, a question:

When running with "-V" (--verbose), do you see a warning about a stubbed "pthread_key_delete" function?


--
Nadav Har'El
n...@scylladb.com <mailto:n...@scylladb.com>

On Sun, Aug 28, 2016 at 12:42 PM, Nadav Har'El <n...@scylladb.com <mailto:n...@scylladb.com>> wrote:


    On Fri, Aug 26, 2016 at 2:55 PM, Justin Cinkelj
    <justin.cink...@xlab.si <mailto:justin.cink...@xlab.si>> wrote:

        Tested, and returned thread ID is correct one now. The
        misc-execve.so + /misc-execve-payload.so test wasn't able to
        show any problem.


    Good, so I'll commit that patch.


        But I noticed different problem, when running open mpi.
        Possible that thread exit handlers are associated with wrong
        thread.


    Thanks. I think I know what's going on:

    The new application is no longer running in a new thread - but
    rather running in its parent's thread.
    So if the new application registers some sort of thread-exit
    callbacks, these won't be run until the thread exists - but this
    happens *after* the new application was unloaded, so the callback
    code no longer exists.

    By the way, it seems our pthreads thread-specific-data support
    (pthread_key_create() et al) does not support ELF namespaces well
    because the list of destructors (tsd_dtor) is global, not
    per-namespace... I think in the past this was just a waste, a leak
    of keys because if !tsd[i] the code in
    pthread_register_tsd_dtor_notifier did not try to to call the
    destructor - but now it because an actual crash. I'll produce a
    workaround patch for you to test.


        ```
        sudo ./scripts/run.py -d -V -n -v -e '/usr/bin/mpirun -np 2 -H
        192.168.122.90 --allow-run-as-root -mca hwloc_base_verbose 10
        --report-bindings --bind-to core --map-by core --mca
        mpi_paffinity_alone 1 -mca rmaps_base_verbose 5 -mca
        rmaps_base_topology /topo/topo-1s-4c.xml -mca
        hwloc_base_mem_alloc_policy 'local_only' -mca
        hwloc_base_mem_bind_failure_action 1 /usr/lib/mpi_hello.so ' |
        tee aa
        ...
        Done
        main EXIT
        Done
        main EXIT
        thread_run_app_in_namespace ret = 0 tid=260
        [osv:00000] *** Process received signal ***
        [osv:00000] Signal: Segmentation fault (11)
        [osv:00000] Associated errno: No error information (-32768)
        [osv:00000] Signal code:  (3994985)
        [osv:00000] Failing at address: 0x100601035000
        page fault outside application, addr: 0x0000100601035000
        [registers]
        RIP: 0x00000000007d1c49 <???+8199241>
RFL: 0x0000000000010246 CS: 0x0000000000000008 SS: 0x0000000000000010
        RAX: 0x0000000000000005  RBX: 0x00002000005fec30  RCX:
        0x000010060103596c RDX: 0x00002000005ffc80
        RSI: 0x0000000000000000  RDI: 0x00002000005fea40  RBP:
        0x00002000005fed20 R8:  0x0000000000000004
        R9:  0x0000000000200000  R10: 0x000000000171ab30  R11:
        0x0000000000000000 R12: 0x00002000005fec30
        R13: 0x0000000000200000  R14: 0x000000000171ab30  R15:
        0x0000000000000000 RSP: 0x00002000005febb0
        Aborted

        [backtrace]
        0x000000000022bc39 <abort(char const*, ...)+270>
        0x00000000003c4f6b <???+3952491>
        0x00000000003c5108 <mmu::vm_fault(unsigned long,
        exception_frame*)+350>
        0x000000000048941e <page_fault+310>
        0x0000000000488306 <???+4752134>
        ```

        And gdb:

        (gdb) bt
        #0  processor::cli_hlt () at arch/x64/processor.hh:248
        #1  0x0000000000209613 in arch::halt_no_interrupts () at
        arch/x64/arch.hh:48
        #2  0x0000000000498a1e in osv::halt () at arch/x64/power.cc:24
        #3  0x000000000022bc60 in abort (fmt=0x99c32c "Aborted\n") at
        runtime.cc:130
        #4  0x000000000022bb2b in abort () at runtime.cc:96
        #5  0x00000000003c4f6c in mmu::vm_sigsegv
        (addr=17617972842496, ef=0xffff800005e23078) at core/mmu.cc:1316
        #6  0x00000000003c5109 in mmu::vm_fault (addr=17617972842496,
        ef=0xffff800005e23078) at core/mmu.cc:1338
        #7  0x000000000048941f in page_fault (ef=0xffff800005e23078)
        at arch/x64/mmu.cc:38
        #8  <signal handler called>
        #9  0x00000000007d1c49 in uw_frame_state_for ()
        #10 0x00000000007d3a99 in _Unwind_Backtrace ()
        #11 0x000000000066fe10 in backtrace (buffer=0x2000005fef50,
        size=32) at libc/misc/backtrace.cc:68
        #12 0x000010000607f5ed in opal_backtrace_print (file=0x12faae0
        <f>, prefix=0x2000005ff0a0 "[osv:00000] ", strip=2)
            at
        ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:47
        #13 0x000010000607c8bf in show_stackframe (signo=11,
        info=0x2000005ffd28, p=0x2000005ff870) at
        ../../../opal/util/stacktrace.c:336
        #14 0x000000000047df78 in call_signal_handler
        (frame=0x2000005ffc80) at arch/x64/signal.cc:85
        #15 <signal handler called>
        #16 0x000010060103596c in ?? ()
        #17 0x0000000000680d68 in
        pthread_private::__lambda5::operator()
        (__closure=0xffff900001e7f000) at libc/pthread.cc:60
        #18 0x000000000068347f in std::_Function_handler<void(),
        
pthread_private::pthread_register_tsd_dtor_notifier()::__lambda5>::_M_invoke(const
        std::_Any_data &) (__functor=...) at
        /usr/include/c++/4.8.2/functional:2071
        #19 0x000000000044a632 in std::function<void ()>::operator()()
        const (this=0xffff900001e80010) at
        /usr/include/c++/4.8.2/functional:2471
        #20 0x00000000005b90c8 in sched::run_exit_notifiers () at
        core/sched.cc:984
        #21 0x00000000005b9c9f in sched::thread::complete
        (this=0xffff800005e1e050) at core/sched.cc:1205
        ---Type <return> to continue, or q <return> to quit---
        #22 0x00000000005b5b43 in sched::thread_main_c
        (t=0xffff800005e1e050) at arch/x64/arch-switch.hh:165
        #23 0x0000000000489286 in thread_main () at arch/x64/entry.S:113

        (gdb) frame 17
        #17 0x0000000000680d68 in
        pthread_private::__lambda5::operator()
        (__closure=0xffff900001e7f000) at libc/pthread.cc:60
        60 tsd_dtor[i](val);
        (gdb) p i
        $1 = 3
        (gdb) p val
        $2 = (void *) 0xffffa00003247600
        (gdb) p tsd_dtor
        $3 = std::vector of length 128, capacity 128 = {0x100005c3596c
        <buffer_cleanup>, 0x1000060988f1 <buffer_cleanup>,
        0x10040103596c, 0x10060103596c,
          0x1004014988f1, 0x1006014988f1, 0x0, 0x0, 0x0, 0x0, 0x0,
        0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
        0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
          0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
        0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
        0x0, 0x0, 0x0, 0x0, 0x0,
          0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
        0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
        0x0, 0x0, 0x0, 0x0, 0x0,
          0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
        0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
        0x0, 0x0, 0x0, 0x0, 0x0,
          0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}

        (gdb) frame 21
        #21 0x00000000005b9c9f in sched::thread::complete
        (this=0xffff800005e1e050) at core/sched.cc:1205
        1205        run_exit_notifiers();
        (gdb) p this
        $4 = (sched::thread * const) 0xffff800005e1e050
        (gdb) p *this
        $5 = {<sched::timer_base::client> = {_vptr.client = 0xa42c70
        <vtable for sched::thread+16>, _timers_need_reload = false,
            _active_timers =
        
{<boost::intrusive::list_impl<boost::intrusive::listopt<boost::intrusive::detail::member_hook_traits<sched::timer_base,
        boost::intrusive::list_member_hook<boost::intrusive::none,
        boost::intrusive::none, boost::intrusive::none>,
        &sched::timer_base::client_hook>, unsigned long, true> >> =
        
{<boost::intrusive::detail::clear_on_destructor_base<boost::intrusive::list_impl<boost::intrusive::listopt<boost::intrusive::detail::member_hook_traits<sched::timer_base,
        boost::intrusive::list_member_hook<boost::intrusive::none,
        boost::intrusive::none, boost::intrusive::none>,
        &sched::timer_base::client_hook>, unsigned long, true> > >> =
        {<No data fields>}, static external_value_traits = false,
        static constant_time_size = true,
                static stateful_value_traits = <optimized out>,
                data_ =
        {<boost::intrusive::detail::member_hook_traits<sched::timer_base,
        boost::intrusive::list_member_hook<boost::intrusive::none,
        boost::intrusive::none, boost::intrusive::none>,
        &sched::timer_base::client_hook>> = {static link_mode =
        <optimized out>},
                  root_plus_size_ =
        {<boost::intrusive::detail::size_holder<true, unsigned long>>
        = {static constant_time_size = <optimized out>, size_ = 0},
                    root_ = {next_ = 0xffff800005e1e068, prev_ =
        0xffff800005e1e068}}}}, <No data fields>}}, _detach_state = {
            _M_i = sched::thread::detach_state::detached}, static
        priority_idle = inf, static priority_default = 1,
          static priority_infinity = 9.99999975e-06, _func =
        {<std::_Maybe_unary_or_binary_function<void>> = {<No data
        fields>}, <std::_Function_base> = {
              static _M_max_size = 16, static _M_max_align = 8,
        _M_functor = {_M_unused = {_M_object = 0xffffa00005cfc500,
                  _M_const_object = 0xffffa00005cfc500,
        _M_function_pointer = 0xffffa00005cfc500,
                  _M_member_pointer = (void
        (std::_Undefined_class::*)(std::_Undefined_class * const))
        0xffffa00005cfc500, this adjustment 7953674122929206130},
        _M_pod_data = "\000\305\317\005\000\240\377\377rsity an"},
              _M_manager = 0x683588
        
<std::_Function_base::_Base_manager<pthread_private::pthread::pthread(void*
        (*)(void*), void*, sigset_t, const
        pthread_private::thread_attr*)::__lambda6>::_M_manager(std::_Any_data
        &, const std::_Any_data &, std::_Manager_operation)>},
            _M_invoker = 0x683566 <std::_Function_handler<void(),
        pthread_private::pthread::pthread(void* (*)(void*), void*,
        sigset_t, const
        pthread_private::thread_attr*)::__lambda6>::_M_invoke(const
        std::_Any_data &)>}, _state = {exception_stack =
        0xffff800001a2e0f0 "!", <incomplete sequence \342>,
            rsp = 0xffff800005e1eb40, rbp = 0xffff800005e1eba0, rip =
        0x5b5690 <sched::thread::switch_to()+228>}, _tcb =
        0xffff900005c147c0, _runtime = {
            _priority = 1, _Rtt = 7.2966543e-08, _renormalize_count = 0},
          _detached_state =
        std::unique_ptr<sched::thread::detached_state> containing
        0xffffa00005d157e0, _attr = {_stack = {begin = 0x200000500000,
        ---Type <return> to continue, or q <return> to quit---
              size = 1048576, deleter = 0x681360
        <pthread_private::pthread::free_stack(sched::thread::stack_info)>},
        _pinned_cpu = 0x0, _detached = true,
            _name = {_M_elems = ">/usr/bin/mpiru"}},
        _migration_lock_counter = 1, _pinned = true, _arch = {
            interrupt_stack = "(c) 2004-2005 The University of
        Tennessee and The University\n#", ' ' <repeats 25 times>, "of
        Tennessee Research Foundation.  All rights\n#", ' ' <repeats
        25 times>, "reserved.\n# Copyright (c) 2004-2005 High "...,
            exception_stack =
        
"\211\004$M\211\320\377ЉE\370\203}\370\000\017\225\300\017\266\300H\205\300t?\213E\370\211\307\350V\250\000
        
E\344\235\000\017\251\000\000\063\213M\344\254\000\023\031\254\000\240b\370\372\377\213E\344\353\005\270&\001\362\r\311\303f\220UH\211\345H\211}\370H\213E\370\017\267@(\017\267\300\203\340\004\205\300k\000/]\303
        \000\002\000\252\001\020\224\213\000\a
        
\000\060\211u\360D\000\261H\213P0H\213E\360H\211\020d\000\003\"\000\061\203\354\020f\000\005&\000#U\360\274\001R\273\377\377\377\311g\000\220\203\354
        
H\211}\350\211u\321\000\301苀\230\000\000\000;E\344\177\nJ\000!\351S|\001\200E\350\213@x\215P\001\n\000"...},
        _id = 260, _interrupted = {<std::atomic_bool> = {_M_base =
        {_M_i = false}}, <No data fields>},
          _cleanup = {<std::_Maybe_unary_or_binary_function<void>> =
        {<No data fields>}, <std::_Function_base> = {static
        _M_max_size = 16,
              static _M_max_align = 8, _M_functor = {_M_unused =
        {_M_object = 0xffffa000056ba490, _M_const_object =
        0xffffa000056ba490,
                  _M_function_pointer = 0xffffa000056ba490,
                  _M_member_pointer = (void
        (std::_Undefined_class::*)(std::_Undefined_class * const))
        0xffffa000056ba490, this adjustment 6046151},
                _M_pod_data =
        "\220\244k\005\000\240\377\377\307A\\\000\000\000\000"},
              _M_manager = 0x68368f
        
<std::_Function_base::_Base_manager<pthread_private::pthread::pthread(void*
        (*)(void*), void*, sigset_t, const
        pthread_private::thread_attr*)::__lambda7>::_M_manager(std::_Any_data
        &, const std::_Any_data &, std::_Manager_operation)>},
            _M_invoker = 0x68366d <std::_Function_handler<void(),
        pthread_private::pthread::pthread(void* (*)(void*), void*,
        sigset_t, const
        pthread_private::thread_attr*)::__lambda7>::_M_invoke(const
        std::_Any_data &)>}, _tls = std::vector of length 1, capacity
        1 = {0xffff900005c14000 ""}, _app = true,
          _app_runtime = std::shared_ptr (count 5, weak 0)
        0xffffa00002776f58, _joiner = {_M_b = {_M_p = 0x0}},
          _runqueue_link =
        
{<boost::intrusive::detail::generic_hook<boost::intrusive::get_set_node_algo<void*,
        false>, boost::intrusive::member_tag,
        (boost::intrusive::link_mode_type)1, 0>> =
        {<boost::intrusive::detail::no_default_definer> = {<No data
        fields>}, <boost::intrusive::rbtree_node<void*>> = {
                parent_ = 0x0, left_ = 0x0, right_ = 0x0, color_ =
        boost::intrusive::rbtree_node<void*>::black_t}, <No data
        fields>}, <No data fields>},
          _wakeup_link = {_helper = 0xffffa00002776188, _next =
        0xffffa000056baf80}, static _s_idgen = 370, static _s_reaper =
        0xffff8000027df040,
          stat_switches = {_counter = {<std::__atomic_base<unsigned
        long>> = {_M_i = 571}, <No data fields>}}, stat_preemptions = {
        ---Type <return> to continue, or q <return> to quit---
            _counter = {<std::__atomic_base<unsigned long>> = {_M_i =
        159}, <No data fields>}}, stat_migrations = {
            _counter = {<std::__atomic_base<unsigned long>> = {_M_i =
        1}, <No data fields>}}, _total_cpu_time = {__r = 167458000},
          _cputime_estimator = {<std::__atomic_base<unsigned long>> =
        {_M_i = 702368892544957}, <No data fields>}}

        (gdb) p (*this)._attr
        $10 = {_stack = {begin = 0x200000500000, size = 1048576,
        deleter = 0x681360
        <pthread_private::pthread::free_stack(sched::thread::stack_info)>},

          _pinned_cpu = 0x0, _detached = true, _name = {_M_elems =
        ">/usr/bin/mpiru"}}


        Now frame 21, _M_elems = ">/usr/bin/mpiru" makes me think that
        is mpirun. And that one should not know much about
        0x10040103596c, 0x10060103596c, 0x1004014988f1, 0x1006014988f1
        from:
        $3 = std::vector of length 128, capacity 128 = {0x100005c3596c
        <buffer_cleanup>, 0x1000060988f1 <buffer_cleanup>,
        0x10040103596c, 0x10060103596c,
          0x1004014988f1, 0x1006014988f1

        The 0x10040xx and 0x10060xx are from mpi_hello.so, started in
        new ELF namespace.
        Hope this gdb is helpful.

        BR Justin


        On 08/25/2016 01:29 PM, Nadav Har'El wrote:
        I just sent another RFC patch which should cause osv_execve
        to wait until app_runtime() is the new application's.
        I haven't really tested it much beyond the usual "make check"
        - can you please try it out?

        Thanks,
        Nadav.


        --
        Nadav Har'El
        n...@scylladb.com <mailto:n...@scylladb.com>

        On Wed, Aug 24, 2016 at 7:21 AM, Justin Cinkelj
        <justin.cink...@xlab.si <mailto:justin.cink...@xlab.si>> wrote:



            On 08/23/2016 11:58 PM, Nadav Har'El wrote:

            On Tue, Aug 23, 2016 at 3:28 PM, Justin Cinkelj
            <justin.cink...@xlab.si <mailto:justin.cink...@xlab.si>>
            wrote:


                @nadav
                With second RFC patch, osv_execve dosn't return
                thread_id==0 any more.


                (code after RFC-2 pathc) The
                application::start_and_join() line
                sched::thread::current()->set_app_runtime(runtime());
                has to finish, so that with_all_app_threads works as
                desired. I think that's inline with your
                explanation. If I add sleep(1) just before it, than
                problem becomes 100% reproducible (it was 10-20%
                reproducible before).


            Yes, so I think we understand now why the problem
            happens. Now the question is how to fix it.

            I have some ideas on how it might be possible to fix it,
            i.e., delay osv_execve()'s return until the new thread
            got its new app_runtime setting.

            However, I started wondering whether we should fix
            anything in osv_execve():

            osv_execve() promises to return the new thread id, and
            that it does (after my last patch).
            However, it doesn't promise anything about how far along
            this thread went: Did it load the executable? Start to
            run it? Did it even set the thread's app_runtime? We
            don't know.
            The question becomes, if we start promising more, why
            the app_runtime thing? Isn't this just one of the
            arbitrary things that happen during the starting of the
            application - why promise that in particular?
            If osv_execve() returned an app instead of a thread id,
            things would be somewhat different, but even then, there
            is a question whether it is fine for osv_execve() to
            return an app that no thread yet belongs to - or whether
            we should wait until at least the one thread we created
            belongs to this app.

            So this got me thinking: what if we just decide that
            osv_execve()'s caller is not guaranteed the app_runtime
            was set?
            The caller (your MPI code looking for threads to
            setaffinity for) would just need to loop checking
            find_by_id(tid)->app_runtime() stopping if find_by_id
            can't find the thread (this means it exited already!) or
            if app_runtime() changed from what
            sched::thread::current->app_runtime() was.
            Alternatively, if the setaffinity code already works in
            a loop (since threads can be created after startup!),
            it's even simpler: Just check if
            find_by_id(tid)->app_runtime() is still the same as
            current->app_runtime() - and if it is, just don't do
            anything (a later iteration of the loop will find the
            threads).

            Does this last plan make sense?
            Yes. No.
            Yes, that someone has to wait until app_runtime is set.
            But as user of OSv, I don't want to know all the OSv
            internal details, and how to work around them.
            If osv_execve returns before new thread app_runtime is
            set, than maybe osv_get_all_app_threads (from patch I
            haven't yet sent to list) could do the waiting on
            app_runtime.
            But it would have to check if new_thread.app_runtime is
            different from caller_thread.app_runtime. And that would
            not work if some thread wants to get list of its own
            thread - caller_thread == new_thread.
            So the wait loop should cover that extra case (not that
            current Open MPI code needs it; it just to be a bit more
            general).
            Sounds acceptable?






--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to