Hello,

It's been a while that I have some random issue.
My understanding is that the methodoly used to abort something in the (repl) 
using Ctrl-C is not "thread safe"

I have seen that when we launch functions making a lot of allocation, sometime 
when we hit Ctrl^C the repl get stuck and every thing is blocked.
Using GDB we can see that a thread is launching a GC_malloc_kind which is stuck 
in a __lll_lock_wait.

This thread is deadlock with himself because he want to lock a mutex that is 
already locked by himself.

I'm not realy sure if it is the GC that should handle this or Bigloo.
But in my case, it is not the only issue: sometime when hitting Ctrl-C my tool 
silently quit, or quit with something like:

^C*** ERROR:unwind-until!, #<pthread:batch>
exit out of thread dynamic scope -- #unspecified

So I feel that it is more coming from Bigloo ...
(note that the "batch" pthread is not the one who run the (repl) )


=======================================================
.scm testcase:
=======================================================
(module test-main
        (main main)
        (library pthread)
        (export make-some-allocation
                make-pthread-allocation)
        (eval (export-all)))

(define (make-pthread-allocation)
  (let ((th (instantiate::pthread (body (lambda ()
                                 (make-some-allocation))))))
    (thread-start-joinable! th)))

(define (make-some-allocation)
  (let ((res '()))
    (for-each
     (lambda (x)
       (set! res (cons (format "Number is: ~a" x) res)))
     (iota 1000000))
    res))

(define (main argv)
  (repl))
=======================================================

Compilation with bigloo -cg


Usage:
Just launch several times (make-some-allocation) and (make-pthread-allocation) 
and do some CTRL^C in between.
At a moment, the command line "1:=> " will not be printed back



=======================================================
GDB debug session:
=======================================================

(gdb) info thread
  8 Thread 0x7ffff6e0b700 (LWP 1891)  0x00000038c6e0b5bc in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7 Thread 0x7ffff640a700 (LWP 1892)  0x00000038c6e0b5bc in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6 Thread 0x7ffff5a09700 (LWP 1893)  0x00000038c6e0b5bc in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 Thread 0x7ffff5008700 (LWP 1894)  0x00000038c6e0b5bc in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 Thread 0x7ffff4607700 (LWP 1895)  0x00000038c6e0b5bc in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 Thread 0x7ffff3c06700 (LWP 1896)  0x00000038c6e0b5bc in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x7ffff3205700 (LWP 1897)  0x00000038c6e0b5bc in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 0x7ffff6e0d700 (LWP 1890)  0x00000038c6e0e264 in __lll_lock_wait () 
from /lib64/libpthread.so.0
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff6e0d700 (LWP 1890))]#0  
0x00000038c6e0e264 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00000038c6e0e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00000038c6e09508 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00000038c6e093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007ffff7495c51 in GC_generic_malloc_many () from 
bigloo/4.3b/libbigloogc_fth-4.3b.so
#4  0x00007ffff74a0c9e in GC_malloc_kind () from 
bigloo4.3b-g5.1.v3/lib/bigloo/4.3b/libbigloogc_fth-4.3b.so
#5  0x00007ffff78d52a1 in make_fx_procedure () from 
bigloo4.3b-g5.1.v3/lib/bigloo/4.3b/libbigloo_s-4.3b.so
#6  0x00007ffff78c6719 in BGl_zc3exitza31380ze3ze70z64zz__evalz00 () from 
bigloo4.3b-g5.1.v3/lib/bigloo/4.3b/libbigloo_s-4.3b.so
#7  0x00007ffff790801f in BGl_zc3exitza31367ze3ze70z64zz__evalz00 () from  
bigloo4.3b-g5.1.v3/lib/bigloo/4.3b/libbigloo_s-4.3b.so
#8  0x00007ffff790a244 in BGl_replz00zz__evalz00 () from 
bigloo4.3b-g5.1.v3/lib/bigloo/4.3b/libbigloo_s-4.3b.so
#9  0x0000000000401e11 in bigloo_main () at test-main.c:263
#10 0x00007ffff78c780b in _bigloo_main () from  
bigloo4.3b-g5.1.v3/lib/bigloo/4.3b/libbigloo_s-4.3b.so
#11 0x00000038c6a1ed5d in __libc_start_main () from /lib64/libc.so.6
#12 0x00000000004017d1 in _start ()
(gdb) info reg
rax            0xfffffffffffffe00            -512
rbx            0x3      3
rcx            0xffffffffffffffff               -1
rdx            0x2      2
rsi            0x80      128
rdi            0x7ffff76ed320                140737344623392
rbp            0x30    0x30
rsp            0x7fffffffcc60  0x7fffffffcc60
r8             0x7ffff76ed320 140737344623392
r9             0x762   1890
r10            0x0       0
r11            0x246  582
r12            0x30    48
r13            0x7ffff76ed540               140737344623936
r14            0x30    48
r15            0x30    48
rip            0x38c6e0e264  0x38c6e0e264 <__lll_lock_wait+36>
eflags         0x246               [ PF ZF IF ]
cs             0x33      51
ss             0x2b     43
ds             0x0       0
es             0x0       0
fs             0x0        0
gs             0x0        0
(gdb) print *((int*)(0x7ffff76ed320)+2)
$1 = 1890
=======================================================


Here is the methodology used to find wich thread is holding the mutex:
https://en.wikibooks.org/wiki/Linux_Applications_Debugging_Techniques/Deadlocks

Do someone have any Idea on this issue ?

Best Regards.
Pierre-Francois






Reply via email to