Hi David, Thank you for the minimal test case. This is enormously helpful.
David Kastrup <d...@gnu.org> writes: > The symptom likely occurs when an object is collected and consequently > its free_smob function gets called. After some investigation, I believe I've confirmed that this is exactly what's happening. I added a field to the Smob class that is set to 12345678 when the object is created (and 21345678 at a later point), and then 87654321 when free_smob is called. I then observed that the segfault happens when the smob_mark function is called on a smob whose tag field is valid, but whose corresponding C++ object has 87654321 in the special field, indicating that its smob_free method was already called. This happened with GC_MARKERS=1. I believe this is a bug in BDWGC. gc_mark.h warns: --8<---------------cut here---------------start------------->8--- /* WARNING: Such a mark procedure may be invoked on an unused object */ /* residing on a free list. Such objects are cleared, except for a */ /* free list link field in the first word. Thus mark procedures may */ /* not count on the presence of a type descriptor, and must handle this */ /* case correctly somehow. */ --8<---------------cut here---------------end--------------->8--- and protect Guile users from this issue by checking that the tag on the smob cell (first word) is correct before calling the user-defined smob_mark function. However, in this case, the mark procedure was called on a smob cell whose first word was the correct tag and whose second word was non-zero and clearly a pointer to an already deleted instance of the C++ Smob class. I think the next step is to remove libguile from this test program, and turn it into a minimal demonstration of this apparent BDWGC bug, for submission to the BDWGC developers. FYI, here are the changes I made to the test program and the results I observed. First, some changes that I think might be needed for GC safety: --8<---------------cut here---------------start------------->8--- --- smobs.hh.ORIG 2015-02-15 13:35:03.000000000 -0500 +++ smobs.hh 2015-02-28 23:42:06.346803668 -0500 @@ -270,8 +271,9 @@ } SCM unprotected_smobify_self () { - self_scm_ = Smob_base<Super>::register_ptr (static_cast<Super *> (this)); - return self_scm_; + SCM s = Smob_base<Super>::register_ptr (static_cast<Super *> (this)); + self_scm_ = s; + return s; } void protect () { @@ -279,12 +281,12 @@ } SCM unprotect () { - unprotect_smob (self_scm_, &protection_cons_); - return self_scm_; + SCM s = self_scm_; + unprotect_smob (s, &protection_cons_); + return s; } void smobify_self () { - self_scm_ = unprotected_smobify_self (); - protect (); + protect_smob (unprotected_smobify_self (), &protection_cons_); } SCM self_scm () const { return self_scm_; } }; --8<---------------cut here---------------end--------------->8--- The changes to 'unprotected_smobify_self' and 'smobify_self" are probably only needed for a multithreaded program. The idea is to try to ensure that the smob is protected from GC by being on the stack (and not merely in the malloc/free heap) until it is protected. The change to 'unprotect' is along the same lines, and might be needed even for single-threaded programs. I don't know that 'scm_gc_unprotect_object' (called from 'unprotect_smob') will guarantee not to free the object passed to it if there are no other visible references to it. However, none of these changes fixed the problem here. BTW, one other thing to keep in mind: it will never be enough if the only GC-visible reference to an object is the C++ Smob object. There must always be a GC-visible reference to the SCM value. --8<---------------cut here---------------start------------->8--- --- smobs.hh.ORIG 2015-02-15 13:35:03.000000000 -0500 +++ smobs.hh 2015-02-28 23:42:06.346803668 -0500 @@ -263,6 +263,7 @@ protected: Smob () : self_scm_ (SCM_UNDEFINED), protection_cons_ (SCM_EOL) { }; public: + volatile int mhw_valid; static size_t free_smob (SCM obj) { delete Smob_base<Super>::unregister_ptr (obj); --- smobs.tcc.ORIG 2015-02-16 08:29:02.000000000 -0500 +++ smobs.tcc 2015-02-28 23:30:19.563298917 -0500 @@ -50,6 +50,7 @@ // announce the memory as being GC-controlled before even // allocating the controlling smob. SCM s = SCM_UNDEFINED; + p->mhw_valid = 12345678; SCM_NEWSMOB (s, smob_tag (), p); scm_gc_register_collectable_memory (p, sizeof (*p), smob_name_.c_str ()); return s; @@ -94,6 +95,9 @@ Smob_base<Super>::unregister_ptr (SCM obj) { Super *p = Super::unchecked_unsmob (obj); + if (p->mhw_valid != 12345678 && p->mhw_valid != 21345678) + abort (); + p->mhw_valid = 87654321; scm_gc_unregister_collectable_memory (p, sizeof (*p), smob_name_.c_str ()); return p; } --- test.cc.ORIG 2015-02-16 10:33:24.000000000 -0500 +++ test.cc 2015-02-28 23:28:34.690778883 -0500 @@ -28,6 +28,7 @@ Family::Family (int totals, int branch) { smobify_self (); + mhw_valid = 21345678; for (int l=0; l<branch; l++) { if (l >= totals) --8<---------------cut here---------------end--------------->8--- Here's what I observed: --8<---------------cut here---------------start------------->8--- mhw@jojen:~/guile-bug$ GC_MARKERS=1 ./test 2 20000 40 Iteration 0 Iteration 1 Iteration 2 Iteration 3 Segmentation fault (core dumped) mhw@jojen:~/guile-bug$ gdb ./test core GNU gdb (GDB) 7.8.2 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./test...done. [New LWP 30663] warning: Could not load shared library symbols for linux-gate.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/gnu/store/3g20rdmnavpblsmgppyl8jhg67nidhjk-glibc-2.20/lib/libthread_db.so.1". Core was generated by `./test 2 20000 40'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0a2abde3 in ?? () warning: File "/gnu/store/7a5b7fqnxzbwhx7k2bhbc4i1nliq53rl-gcc-4.9.2-lib/lib/libstdc++.so.6.0.20-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/gnu/store/3lhr8q28q6f59774di9av7ncy09jd55d-guile-2.0.11/lib/libguile-2.0.so.22.7.2-gdb.scm". To enable execution of this file add add-auto-load-safe-path /gnu/store/7a5b7fqnxzbwhx7k2bhbc4i1nliq53rl-gcc-4.9.2-lib/lib/libstdc++.so.6.0.20-gdb.py line to your configuration file "/home/mhw/.gdbinit". To completely disable this security protection add set auto-load safe-path / line to your configuration file "/home/mhw/.gdbinit". For more information about this security protection see the "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path" (gdb) bt #0 0x0a2abde3 in ?? () #1 0xb766dc4a in smob_mark (addr=0xa15f6e0, mark_stack_ptr=0x9e5a320, mark_stack_limit=0x9e62000, env=0) at smob.c:335 #2 0xb758664a in GC_mark_from (mark_stack_top=0x9e5a320, mark_stack=0x9e5a000, mark_stack_limit=0x9e62000) at mark.c:737 #3 0xb7587c07 in GC_mark_some (cold_gc_frame=0xbf874ed8 "pVZ\267M\332W\267\330N\207\277") at mark.c:386 #4 0xb757da4d in GC_stopped_mark (stop_func=stop_func@entry=0xb757d5b0 <GC_never_stop_func>) at alloc.c:637 #5 0xb757e4a9 in GC_try_to_collect_inner (stop_func=0xb757d5b0 <GC_never_stop_func>) at alloc.c:456 #6 0xb757ee00 in GC_collect_or_expand (needed_blocks=needed_blocks@entry=1, ignore_off_page=ignore_off_page@entry=0, retry=retry@entry=0) at alloc.c:1266 #7 0xb757effc in GC_allocobj (gran=gran@entry=3, kind=kind@entry=1) at alloc.c:1355 #8 0xb758440b in GC_generic_malloc_inner (lb=lb@entry=24, k=k@entry=1) at malloc.c:133 #9 0xb7581e8a in GC_register_finalizer_inner (obj=obj@entry=0xa005720, fn=fn@entry=0xb766d970 <finalize_smob>, cd=cd@entry=0x0, ofn=ofn@entry=0xbf875098, ocd=ocd@entry=0xbf87509c, mp=mp@entry=0xb75811c0 <GC_null_finalize_mark_proc>) at finalize.c:524 #10 0xb75821e5 in GC_register_finalizer_no_order (obj=obj@entry=0xa005720, fn=fn@entry=0xb766d970 <finalize_smob>, cd=cd@entry=0x0, ofn=ofn@entry=0xbf875098, ocd=ocd@entry=0xbf87509c) at finalize.c:575 #11 0xb761bfbb in scm_i_set_finalizer (obj=obj@entry=0xa005720, proc=proc@entry=0xb766d970 <finalize_smob>, data=data@entry=0x0) at finalizers.c:44 #12 0xb766e1f8 in scm_i_new_smob (tc=7039, data=171214120) at smob.c:416 #13 0x08049b12 in scm_new_smob (tc=7039, data=171214120) at /gnu/store/3lhr8q28q6f59774di9av7ncy09jd55d-guile-2.0.11/include/guile/2.0/libguile/smob.h:91 #14 0x0804a8dc in Smob_base<Family>::register_ptr (p=0xa348528) at smobs.tcc:54 #15 0x0804a281 in Smob<Family>::unprotected_smobify_self (this=0xa34852c) at smobs.hh:274 #16 0x08049d0a in Smob<Family>::smobify_self (this=0xa34852c) at smobs.hh:289 #17 0x0804960c in Family::Family (this=0xa348528, totals=8, branch=2) at test.cc:30 #18 0x08049658 in Family::Family (this=0xa2a1638, totals=17, branch=2) at test.cc:36 #19 0x08049658 in Family::Family (this=0xa2a1758, totals=37, branch=2) at test.cc:36 #20 0x08049658 in Family::Family (this=0xa349478, totals=76, branch=2) at test.cc:36 #21 0x08049658 in Family::Family (this=0xa349008, totals=154, branch=2) at test.cc:36 #22 0x08049658 in Family::Family (this=0xa347318, totals=310, branch=2) at test.cc:36 #23 0x08049658 in Family::Family (this=0xa29d938, totals=623, branch=2) at test.cc:36 #24 0x08049658 in Family::Family (this=0xa2998b8, totals=1248, branch=2) at test.cc:36 #25 0x08049658 in Family::Family (this=0xa27f818, totals=2498, branch=2) at test.cc:36 #26 0x08049658 in Family::Family (this=0xa343608, totals=4998, branch=2) at test.cc:36 #27 0x08049658 in Family::Family (this=0xa1e2a88, totals=9998, branch=2) at test.cc:36 #28 0x08049658 in Family::Family (this=0x9f9a3b8, totals=19999, branch=2) at test.cc:36 #29 0x0804989f in workload (avv=0xbf8759f4) at test.cc:72 #30 0xb7608480 in c_body (d=0xbf875884) at continuations.c:517 #31 0xb76955de in apply_catch_closure (clo=#<smob catch-closure 9ffab20>, args=()) at throw.c:140 #32 0xb766d8cf in apply_1 (smob=#<smob catch-closure 9ffab20>, a=()) at smob.c:142 #33 0xb76ac662 in vm_regular_engine (vm=#<vm 9ee2060>, program=#<program 9ee04c0>, argv=<optimized out>, nargs=<optimized out>) at vm-i-system.c:858 #34 0xb76ae621 in scm_c_vm_run (vm=#<vm 9ee2060>, program=program@entry=#<program 9f5ddc8>, argv=argv@entry=0xbf875790, nargs=nargs@entry=4) at vm.c:768 #35 0xb76135b4 in scm_call_4 (proc=#<program 9f5ddc8>, arg1=arg1@entry=#t, arg2=arg2@entry=#<smob catch-closure 9ffab20>, arg3=arg3@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>, arg4=arg4@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>) at eval.c:507 #36 0xb76957ee in scm_catch_with_pre_unwind_handler (key=key@entry=#t, thunk=thunk@entry=#<smob catch-closure 9ffab20>, handler=handler@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>, pre_unwind_handler=<error reading variable: ERROR: Cannot access memory at address 0x0>) at throw.c:73 #37 0xb7695920 in scm_c_catch (tag=tag@entry=#t, body=body@entry=0xb7608470 <c_body>, body_data=body_data@entry=0xbf875884, handler=handler@entry=0xb7608930 <c_handler>, handler_data=handler_data@entry=0xbf875884, pre_unwind_handler=pre_unwind_handler@entry=0xb7608630 <pre_unwind_handler>, pre_unwind_handler_data=pre_unwind_handler_data@entry=0x9ee2020) at throw.c:207 #38 0xb7608cd7 in scm_i_with_continuation_barrier (body=body@entry=0xb7608470 <c_body>, body_data=body_data@entry=0xbf875884, handler=handler@entry=0xb7608930 <c_handler>, handler_data=handler_data@entry=0xbf875884, pre_unwind_handler=pre_unwind_handler@entry=0xb7608630 <pre_unwind_handler>, pre_unwind_handler_data=0x9ee2020) at continuations.c:455 #39 0xb7608da6 in scm_c_with_continuation_barrier (func=0x80497a4 <workload(void*)>, data=0xbf8759f4) at continuations.c:551 #40 0xb7692b32 in with_guile_and_parent (base=0xbf8758ec, data=0xbf875914) at threads.c:906 #41 0xb7589ac8 in GC_call_with_stack_base (fn=fn@entry=0xb7692ae0 <with_guile_and_parent>, arg=arg@entry=0xbf875914) at misc.c:1840 #42 0xb7692fcf in scm_i_with_guile_and_parent (parent=<optimized out>, data=0xbf8759f4, func=0x80497a4 <workload(void*)>) at threads.c:949 #43 scm_with_guile (func=0x80497a4 <workload(void*)>, data=0xbf8759f4) at threads.c:955 #44 0x0804997d in main (ac=4, av=0xbf8759f4) at test.cc:86 (gdb) dir /home/mhw/gc-7.4.2 Source directories searched: /home/mhw/gc-7.4.2:$cdir:$cwd (gdb) dir /home/mhw/guile-2.0.11/libguile Source directories searched: /home/mhw/guile-2.0.11/libguile:/home/mhw/gc-7.4.2:$cdir:$cwd (gdb) up #1 0xb766dc4a in smob_mark (addr=0xa15f6e0, mark_stack_ptr=0x9e5a320, mark_stack_limit=0x9e62000, env=0) at smob.c:335 warning: Source file is more recent than executable. 335 obj = scm_smobs[smobnum].mark (cell); (gdb) list 330 scm_i_pthread_setspecific (current_mark_stack_pointer, mark_stack_ptr); 331 scm_i_pthread_setspecific (current_mark_stack_limit, mark_stack_limit); 332 333 /* Invoke the SMOB's mark procedure, which will in turn invoke 334 `scm_gc_mark', which may modify `current_mark_stack_pointer'. */ 335 obj = scm_smobs[smobnum].mark (cell); 336 337 mark_stack_ptr = scm_i_pthread_getspecific (current_mark_stack_pointer); 338 339 if (SCM_NIMP (obj)) (gdb) p *(Family *)0x9f9a3b8 $1 = {<Smob<Family>> = {<Smob_base<Family>> = {static smob_tag_ = 7039, static scm_init_ = {<No data fields>}, static smob_name_ = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x9ea1fd4 "Family"}}, static type_p_name_ = <optimized out>, static smob_proc = <optimized out>, static smob_proc_signature_ = <optimized out>}, self_scm_ = #<smob Family a3b28a0>, protection_cons_ = (), mhw_valid = 21345678}, _vptr.Family = 0x804b668 <vtable for Family+8>, kids = {<std::_Vector_base<Family*, std::allocator<Family*> >> = { _M_impl = {<std::allocator<Family*>> = {<__gnu_cxx::new_allocator<Family*>> = {<No data fields>}, <No data fields>}, _M_start = 0x9f9a788, _M_finish = 0x9f9a78c, _M_end_of_storage = 0x9f9a78c}}, <No data fields>}} (gdb) p ((Family *)0x9f9a3b8)->mhw_valid $2 = 21345678 (gdb) p ((Family *)0xa348528)->mhw_valid $3 = 12345678 (gdb) p ((Family *)0xa2a1638)->mhw_valid $4 = 21345678 (gdb) x/2xw addr 0xa15f6e0: 0x00001b7f 0x0a352c58 (gdb) p *(Family *)0x0a352c58 $5 = {<Smob<Family>> = {<Smob_base<Family>> = {static smob_tag_ = 7039, static scm_init_ = {<No data fields>}, static smob_name_ = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x9ea1fd4 "Family"}}, static type_p_name_ = <optimized out>, static smob_proc = <optimized out>, static smob_proc_signature_ = <optimized out>}, self_scm_ = #<smob Family a15f6e0>, protection_cons_ = (), mhw_valid = 87654321}, _vptr.Family = 0xa202ca8, kids = {<std::_Vector_base<Family*, std::allocator<Family*> >> = { _M_impl = {<std::allocator<Family*>> = {<__gnu_cxx::new_allocator<Family*>> = {<No data fields>}, <No data fields>}, _M_start = 0xa352cf8, _M_finish = 0xa352d00, _M_end_of_storage = 0xa352d00}}, <No data fields>}} (gdb) p/x 7039 $6 = 0x1b7f (gdb) --8<---------------cut here---------------end--------------->8--- Regards, Mark