Package: ruby2.3
Version: 2.3.3-1+deb9u1
Severity: important
Tags: upstream patch
Forwarded: https://bugs.ruby-lang.org/issues/13794

Hello,

After the upgrade to stretch we keep finding ruby processes (puppet
agents in particular) stuck in a sched_yield busyloop. The stuck process
is always a forked child of the main puppet agent running inside a
timeout block.

The backtrace of the process is the following:

(gdb) thread apply all bt

Thread 2 (Thread 0x7f2dc7904700 (LWP 11226)):
#0  0x00007f2dc63bb6ad in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f2dc73fba62 in timer_thread_sleep (gvl=0x5628917b3f28) at
thread_pthread.c:1455
#2  thread_timer (p=0x5628917b3f28) at thread_pthread.c:1563
#3  0x00007f2dc7045494 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#4  0x00007f2dc63c4aff in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 1 (Thread 0x7f2dc78fc700 (LWP 11224)):
#0  0x00007f2dc63adca7 in sched_yield () from
/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f2dc73fbac5 in native_stop_timer_thread () at
thread_pthread.c:1664
#2  rb_thread_stop_timer_thread () at thread.c:3902
#3  0x00007f2dc7341c42 in before_exec_non_async_signal_safe () at
process.c:1175
#4  before_exec () at process.c:1181
#5  rb_f_exec (argc=<optimized out>, argv=<optimized out>) at
process.c:2576

And the offending part of the code is this:

native_stop_timer_thread(void)
{
    int stopped;
    stopped = --system_working <= 0;

    if (TT_DEBUG) fprintf(stderr, "stop timer thread\n");
#if USE_SLEEPY_TIMER_THREAD
    if (stopped) {
        /* prevent wakeups from signal handler ASAP */
        timer_thread_pipe.owner_process = 0;  

        /*   
         * however, the above was not enough: the FD may already be
         * captured and in the middle of a write while we are running,
         * so wait for that to finish:
         */  
        while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
            native_thread_yield();
        }   
[..]
}

Thread 1 is spinning around `timer_thread_pipe.writing` because someone has
 erroneously bumped it to 1.

(gdb) print timer_thread_pipe
$1 = {normal = {3, 4}, low = {5, 6}, owner_process = 0, writing = 1}


Our case seems identical to this [1] bug report. We have applied the patch [2]
by Eric Wong and the problem seems resolved without causing any other problems.



[1] https://bugs.ruby-lang.org/issues/13794
[2] https://80x24.org/spew/20170809232533.14932-...@80x24.org/raw


Kind regards,
-- 
Gregory Potamianos
Skroutz S.A
greg...@skroutz.gr


-- System Information:
Debian Release: 9.0
  APT prefers stable-debug
  APT policy: (500, 'stable-debug'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.9.0-3-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages ruby2.3 depends on:
ii  libc6                 2.24-11+deb9u1
ii  libgmp10              2:6.1.2+dfsg-1
ii  libruby2.3            2.3.3-1+deb9u1
ii  rubygems-integration  1.11

Versions of packages ruby2.3 recommends:
ii  fonts-lato    2.0-1
ii  libjs-jquery  3.1.1-2

ruby2.3 suggests no packages.

-- no debconf information

Reply via email to